no result when searching sentences in solr

2011-09-21 Thread hadi
I index some pdf and docx with solrj and when i want to create query some
sentences like We'd be glad to have you accompany or anything else, the
result is empty. is it any configuration?
i mention that i create query in /solr/browse

--
View this message in context: 
http://lucene.472066.n3.nabble.com/no-result-when-searching-sentences-in-solr-tp3354659p3354659.html
Sent from the Solr - User mailing list archive at Nabble.com.


boost a document which has a field not empty

2011-09-21 Thread Zoltan Altfatter
Hi,

I have one entity called organisation. I am indexing their name to be able
to search afterwards on their name.
I store also the website of the organisation. Some organisations have a
website some don't.
Can I achieve that when searching for organisations even if I have a match
on their name I will show first those which have a website.

Thank you.

Regards,
Zoltan


Solr Indexing - Null Values in date field

2011-09-21 Thread mechravi25
Hi,

I have a field in my source with data type as string and that field has NULL
values. I am trying to index this field in solr as a date data type with
multivalued = true. Following is the entry for that field in my schema.xml

field name=startdate type=date indexed=true stored=true
multiValued=true required=false /

When I try to index, I get the following exception

org.apache.solr.common.SolrException: Invalid Date String:''
at org.apache.solr.schema.DateField.parseMath(DateField.java:163)
at 
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:171)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:95)
at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:618)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:261)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)


I even tried giving the IFNULL condition in my query for that field(Eg:
IFNULL(startdate,'') and also IFNULL(startdate,NULL)) but still I am getting
the same exception.

Is there any way to index the null values as such in date field? 

Please help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Indexing-Null-Values-in-date-field-tp3355068p3355068.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Indexing - Null Values in date field

2011-09-21 Thread Gora Mohanty
On Wed, Sep 21, 2011 at 4:08 PM, mechravi25 mechrav...@yahoo.co.in wrote:
 Hi,

 I have a field in my source with data type as string and that field has NULL
 values. I am trying to index this field in solr as a date data type with
 multivalued = true. Following is the entry for that field in my schema.xml
[...]

One cannot have NULL values as input for Solr date fields. The
multivalued part is irrelevant here.

As it seems like you are getting the input data from a database,
you will need to supply some invalid date for NULL date values.
E.g., with mysql, we have:
COALESCE( CreationDate, STR_TO_DATE( '1970,1,1', '%Y,%m,%d' ) )
The required syntax will be different for other databases.

Regards,
Gora


Fuzzy Suggester

2011-09-21 Thread O. Klein
From http://wiki.apache.org/solr/Suggester:

JaspellLookup can provide fuzzy suggestions, though this functionality is
not currently exposed (it's a one line change in JaspellLookup).

Anybody know what change this would have to be?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fuzzy-Suggester-tp3355111p3355111.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem using EdgeNGram

2011-09-21 Thread Kissue Kissue
Hi,

I am using solr 3.3 with SolrJ. I am trying to use EdgeNgram to power auto
suggest feature in my application. My understanding is that using EdgeNgram
would mean that results will only be returned for records starting with the
search criteria but this is not happening for me.

For example if i search for tr, i get results as following:

Greenham Trading 6
IT Training Publications
AA Training

Below are details of my configuration:

fieldType name=edgytext class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=15 /
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

field name=businessName type=edgytext indexed=true stored=true
required=true omitNorms=true omitTermFreqAndPositions=true /

Any ideas why this is happening will be much appreciated.

Thanks.


JSON response with SolrJ

2011-09-21 Thread Kissue Kissue
Hi,

I am using solr 3.3 with SolrJ. Does anybody have any idea how i can
retrieve JSON response with SolrJ? Is it possible? It seems to be more
focused on XML and Beans.

Thanks.


Re: JSON response with SolrJ

2011-09-21 Thread Parvin Gasimzade
Hi,

Similar question asked before.Maybe it can help.
http://lucene.472066.n3.nabble.com/SolrJ-Response-JSON-td1002024.html

On Wed, Sep 21, 2011 at 3:01 PM, Kissue Kissue kissue...@gmail.com wrote:

 Hi,

 I am using solr 3.3 with SolrJ. Does anybody have any idea how i can
 retrieve JSON response with SolrJ? Is it possible? It seems to be more
 focused on XML and Beans.

 Thanks.



Re: Problem using EdgeNGram

2011-09-21 Thread O. Klein
Try using KeywordTokenizerFactory instead of StandardTokenizerFactory to get
the results you want.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-using-EdgeNGram-tp3355132p3355211.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: boost a document which has a field not empty

2011-09-21 Thread Alexei Martchenko
Can u assign a doc boost at index time?

2011/9/21 Zoltan Altfatter altfatt...@gmail.com

 Hi,

 I have one entity called organisation. I am indexing their name to be able
 to search afterwards on their name.
 I store also the website of the organisation. Some organisations have a
 website some don't.
 Can I achieve that when searching for organisations even if I have a match
 on their name I will show first those which have a website.

 Thank you.

 Regards,
 Zoltan




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Sort five random Top Offers to the top

2011-09-21 Thread MOuli
Hey Community.

I got a Lucene/Solr Index with many offers. Some of them are marked by a
flag field topoffer that they are top offers. Now I want so sort randomly
5 of this offers on the top.

For Example
HTC Sensation
 - topoffer = true
HTC Desire
 - topoffer = false
Samsung Galaxy S2
 - topoffer = ture
IPhone 4
 - topoffer = true 
...

When i search for a Handy then i want that first 3 offers are HTC Sensation,
Samsung Galaxy S2 and the iPhone 4.


Does anyone have an idea?

PS.: I hope my english is not to bad 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3355469.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: boost a document which has a field not empty

2011-09-21 Thread Ahmet Arslan
 I have one entity called organisation. I am indexing their
 name to be able
 to search afterwards on their name.
 I store also the website of the organisation. Some
 organisations have a
 website some don't.
 Can I achieve that when searching for organisations even if
 I have a match
 on their name I will show first those which have a
 website.

Which query parser are you using? lucene? (e)dismax?

If lucene (default one), you can add an optional clause to your query:

q=+(some query) website:[* TO *]^10 (assuming you have OR as default operator)

If dismax, there is a bq parameter which accepts lucene query syntax 
bq=website:[* TO *]^10

http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29


Re: MMapDirectory failed to map a 23G compound index segment

2011-09-21 Thread Robert Muir
On Tue, Sep 20, 2011 at 12:32 PM, Michael McCandless
luc...@mikemccandless.com wrote:

 Or: is it possible you reopened the reader several times against the
 index (ie, after committing from Solr)?  If so, I think 2.9.x never
 unmaps the mapped areas, and so this would accumulate against the
 system limit.

In order to unmap in Lucene 2.9.x you must specifically turn this
unmapping on with setUseUnmapHack(true)

-- 
lucidimagination.com


Re: boost a document which has a field not empty

2011-09-21 Thread Zoltan Altfatter
Yes, I am using edismax and the bq parameter did the trick. Thanks a lot.

On Wed, Sep 21, 2011 at 3:59 PM, Ahmet Arslan iori...@yahoo.com wrote:

  I have one entity called organisation. I am indexing their
  name to be able
  to search afterwards on their name.
  I store also the website of the organisation. Some
  organisations have a
  website some don't.
  Can I achieve that when searching for organisations even if
  I have a match
  on their name I will show first those which have a
  website.

 Which query parser are you using? lucene? (e)dismax?

 If lucene (default one), you can add an optional clause to your query:

 q=+(some query) website:[* TO *]^10 (assuming you have OR as default
 operator)

 If dismax, there is a bq parameter which accepts lucene query syntax
 bq=website:[* TO *]^10

 http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29



LocalParams, bq, and highlighting

2011-09-21 Thread Demian Katz
I've run into another strange behavior related to LocalParams syntax in Solr 
1.4.1.  If I apply Dismax boosts using bq in LocalParams syntax, the contents 
of the boost queries get used by the highlighter.  Obviously, when I use bq as 
a separate parameter, this is not an issue.

To clarify, here are two searches that yield identical results but different 
highlighting behaviors:

http://localhost:8080/solr/biblio/select/?q=johnrows=20start=0indent=yesqf=author^100qt=dismaxbq=author%3Asmith^1000fl=scorehl=truehl.fl=*

http://localhost:8080/solr/biblio/select/?q=%28%28_query_%3A%22{!dismax+qf%3D\%22author^100\%22+bq%3D\%27author%3Asmith^1000\%27}john%22%29%29rows=20start=0indent=yesfl=scorehl=truehl.fl=*

Query #1 highlights only john (the desired behavior), but query #2 highlights 
both john and smith.

Is this a known limitation of the highlighter, or is it a bug?  Is this issue 
resolved in newer versions of Solr?

thanks,
Demian


Selective values for facets

2011-09-21 Thread ntsrikanth
Hi,

 The dataset I have got is for special offers. 
We got lot of offer codes. But I need to create few facets for specific
conditions only.

For example, I got the following codes: ABCD, AGTR, KUYH, NEWY, NEWA, NEWB,
EAS1, EAS2

And I need to create a facet like 
'New Year Offers' mapped with NEWA, NEWB, NEWY and
'Easter Offers' mapped with EAS1, EAS2

I dont want other codes returned in the facet when I query it. How to
prevent other values to be ignored while creating the facet during indexing
time?

Thanks,
Srikanth NT



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Selective-values-for-facets-tp3355676p3355676.html
Sent from the Solr - User mailing list archive at Nabble.com.


Best Practices for indexing nested XML in Solr via DIH

2011-09-21 Thread Pulkit Singhal
Hello Everyone,

I was wondering what are the various best practices that everyone
follows for indexing nested XML into Solr. Please don't feel limited
by examples, feel free to share your own experiences.

Given an xml structure such as the following:
categoryPath
category
idcat001/id
nameEverything/name
/category
category
idcat002/id
nameMusic/name
/category
category
idcat003/id
namePop/name
/category
/categoryPath

How do you make the best use of the data when indexing?

1) Do you use Scenario A?
categoryPath_category_id = cat001 cat002 cat003 (flattened)
categoryPath_category_name = Everything Music Pop (flattened)
If so then how do you manage to find the corresponding
categoryPath_category_id if someone's search matches a value in the
categoryPath_category_name field? I understand that Solr is not about
lookups but this may be important information for you to display right
away as part of the search results page rendering.

2) Do you use Scenario B?
categoryPath_category_id = [cat001 cat002 cat003] (the [] signifies a
multi-value field)
categoryPath_category_name = [Everything Music Pop] (the [] signifies
a multi-value field)
And once again how do you find associated data sets once something matches.
Side Question: How can one configure DIH to store the data this way
for Scenario B?

Thanks!
- Pulkit


Re: How to write core's name in log

2011-09-21 Thread Pulkit Singhal
Not sure if this is a good lead for you but when I run out-of-the-box
multi-core example-DIH instance of Solr, I often see core name thrown
about in the logs. Perhaps you can look there?

On Thu, Sep 15, 2011 at 6:50 AM, Joan joan.monp...@gmail.com wrote:
 Hi,

 I have multiple core in Solr and I want to write core name in log through to
 lo4j.

 I've found in SolrException a method called log(Logger log, Throwable e) but
 when It try to build a Exception it haven't core's name.

 The Exception is built in toStr() method in SolrException class, so I want
 to write core's name in the message of Exception.

 I'm thinking to add MDC variable, this will be name of core. Finally I'll
 use it in log4j configuration like this in ConversionPattern %X{core}

 The idea is that when Solr received a request I'll add this new variable
 name of core.

 But I don't know if it's a good idea or not.

 or Do you already exists any solution for add name of core in log?

 Thanks

 Joan



Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Shawn Heisey

On 9/20/2011 4:09 PM, Robert Muir wrote:
yes, mergeFactory=10 is interpreted as both segmentsPerTier and 
maxMergeAtOnce. yes, specifying explicit TieredMP parameters will 
override whatever you set in mergeFactor (which is basically only 
interpreted to be backwards compatible) this is why i created this 
confusing test configuration: to test this exact case. 


I've got a checked out lucene_solr_3_4 and this isn't what I'm seeing.
Solr Implementation Version: 3.4-SNAPSHOT 1173320M - root - 2011-09-21 
09:58:58


With no mergeFactor defined, maxMergeAtOnce and segmentsPerTier seem to 
be ignored.  I've got both set to 35, but Solr is merging every 10 
segments.  I haven't tried explicitly setting mergeFactor yet to see if 
that will make the other settings override it, I'm letting the current 
import finish first.


Here's the relevant config pieces.  These two sections are in separate 
files incorporated into solrconfig.xml using xinclude:


indexDefaults
useCompoundFilefalse/useCompoundFile
mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
int name=maxMergeCount4/int
int name=maxThreadCount4/int
/mergeScheduler
ramBufferSizeMB96/ramBufferSizeMB
maxFieldLength32768/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout
lockTypenative/lockType
/indexDefaults

mergePolicy class=org.apache.lucene.index.TieredMergePolicy
int name=maxMergeAtOnce35/int
int name=segmentsPerTier35/int
int name=maxMergeAtOnceExplicit105/int
/mergePolicy

Thanks,
Shawn



strange copied field problem

2011-09-21 Thread Tanner Postert
i have 3 fields that I am working with: genre, genre_search and text. genre
is a string field which comes from the data source. genre_search is a text
field that is copied from genre, and text is a text field that is copied
from genre_search and a few other fields. Text field is the default search
field for queries. When I search for q=genre_search:indie+rock, solr returns
several records that have both Indie as a genre and Rock as a genre, which
is great, but when I search for q=indie+rock or q=text:indie+rock, i get no
results.

Why would the source field return the value and the destination wouldn't.
Both genre_search and text are the same data type, so there shouldn't be any
strange translations happening.


Re: FW: MMapDirectory failed to map a 23G compound index segment

2011-09-21 Thread Yongtao Liu
I hit similar issue recently.
Not sure if MMapDirectory is right way to go.

When index file be map to ram, JVM will call OS file mapping function.
The memory usage is in share memory, it may not be calculate to JVM process
space.

I saw one problem is if the index file bigger then physical ram, and there
are lot of query which cause wide index file access.
Then, the machine has no available memory.
The system change to very slow.

What i did is change lucene code to disable MMapDirectory.

On Wed, Sep 21, 2011 at 1:26 PM, Yongtao Liu y...@commvault.com wrote:



 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Tuesday, September 20, 2011 3:33 PM
 To: solr-user@lucene.apache.org
 Subject: Re: MMapDirectory failed to map a 23G compound index segment

 Since you hit OOME during mmap, I think this is an OS issue not a JVM
 issue.  Ie, the JVM isn't running out of memory.

 How many segments were in the unoptimized index?  It's possible the OS
 rejected the mmap because of process limits.  Run cat
 /proc/sys/vm/max_map_count to see how many mmaps are allowed.

 Or: is it possible you reopened the reader several times against the index
 (ie, after committing from Solr)?  If so, I think 2.9.x never unmaps the
 mapped areas, and so this would accumulate against the system limit.

  My memory of this is a little rusty but isn't mmap also limited by mem +
 swap on the box? What does 'free -g' report?

 I don't think this should be the case; you are using a 64 bit OS/JVM so in
 theory (except for OS system wide / per-process limits imposed) you should
 be able to mmap up to the full 64 bit address space.

 Your virtual memory is unlimited (from ulimit output), so that's good.

 Mike McCandless

 http://blog.mikemccandless.com

 On Wed, Sep 7, 2011 at 12:25 PM, Rich Cariens richcari...@gmail.com
 wrote:
  Ahoy ahoy!
 
  I've run into the dreaded OOM error with MMapDirectory on a 23G cfs
  compound index segment file. The stack trace looks pretty much like
  every other trace I've found when searching for OOM  map failed[1].
  My configuration
  follows:
 
  Solr 1.4.1/Lucene 2.9.3 (plus
  SOLR-1969https://issues.apache.org/jira/browse/SOLR-1969
  )
  CentOS 4.9 (Final)
  Linux 2.6.9-100.ELsmp x86_64 yada yada yada Java SE (build
  1.6.0_21-b06) Hotspot 64-bit Server VM (build 17.0-b16, mixed mode)
  ulimits:
 core file size (blocks, -c) 0
 data seg size(kbytes, -d) unlimited
 file size (blocks, -f) unlimited
 pending signals(-i) 1024
 max locked memory (kbytes, -l) 32
 max memory size (kbytes, -m) unlimited
 open files(-n) 256000
 pipe size (512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 stack size(kbytes, -s) 10240
 cpu time(seconds, -t) unlimited
 max user processes (-u) 1064959
 virtual memory(kbytes, -v) unlimited
 file locks(-x) unlimited
 
  Any suggestions?
 
  Thanks in advance,
  Rich
 
  [1]
  ...
  java.io.IOException: Map failed
   at sun.nio.ch.FileChannelImpl.map(Unknown Source)
   at
  org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(Unknown
  Source)
   at
  org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(Unknown
  Source)
   at org.apache.lucene.store.MMapDirectory.openInput(Unknown Source)
   at org.apache.lucene.index.SegmentReader$CoreReaders.init(Unknown
  Source)
 
   at org.apache.lucene.index.SegmentReader.get(Unknown Source)
   at org.apache.lucene.index.SegmentReader.get(Unknown Source)
   at org.apache.lucene.index.DirectoryReader.init(Unknown Source)
   at org.apache.lucene.index.ReadOnlyDirectoryReader.init(Unknown
  Source)
   at org.apache.lucene.index.DirectoryReader$1.doBody(Unknown Source)
   at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(Unknown
  Source)
   at org.apache.lucene.index.DirectoryReader.open(Unknown Source)
   at org.apache.lucene.index.IndexReader.open(Unknown Source) ...
  Caused by: java.lang.OutOfMemoryError: Map failed
   at sun.nio.ch.FileChannelImpl.map0(Native Method) ...
 
 **Legal Disclaimer***
 This communication may contain confidential and privileged
 material for the sole use of the intended recipient. Any
 unauthorized review, use or distribution by others is strictly
 prohibited. If you have received the message in error, please
 advise the sender by reply email and delete the message. Thank
 you.
 *



SolrCloud state

2011-09-21 Thread Miguel Coxo
Hi there.

I'm starting a new project using solr and i would like to know if solr is
able to setup a cluster with fault tolerance.

I'm setting up an environment with two shards. Each shard should have a
replica.

What i would like to know is if a shard master fails will the replica be
promoted to a master. Or will it remain search only and only recover when
a new master is setup.

Also how is the document indexing distributed by the shards? Can i add a new
shard dynamically?

All the best, Miguel Coxo.


Re: strange copied field problem

2011-09-21 Thread Pulkit Singhal
I am NOT claiming that making a copy of a copy field is wrong or leads
to a race condition. I don't know that. BUT did you try to copy into
the text field directly from the genre field? Instead of the
genre_search field? Did that yield working queries?

On Wed, Sep 21, 2011 at 12:16 PM, Tanner Postert
tanner.post...@gmail.com wrote:
 i have 3 fields that I am working with: genre, genre_search and text. genre
 is a string field which comes from the data source. genre_search is a text
 field that is copied from genre, and text is a text field that is copied
 from genre_search and a few other fields. Text field is the default search
 field for queries. When I search for q=genre_search:indie+rock, solr returns
 several records that have both Indie as a genre and Rock as a genre, which
 is great, but when I search for q=indie+rock or q=text:indie+rock, i get no
 results.

 Why would the source field return the value and the destination wouldn't.
 Both genre_search and text are the same data type, so there shouldn't be any
 strange translations happening.



Re: strange copied field problem

2011-09-21 Thread Tanner Postert
i believe that was the original configuration, but I can switch it back and
see if that yields any results.

On Wed, Sep 21, 2011 at 10:54 AM, Pulkit Singhal pulkitsing...@gmail.comwrote:

 I am NOT claiming that making a copy of a copy field is wrong or leads
 to a race condition. I don't know that. BUT did you try to copy into
 the text field directly from the genre field? Instead of the
 genre_search field? Did that yield working queries?

 On Wed, Sep 21, 2011 at 12:16 PM, Tanner Postert
 tanner.post...@gmail.com wrote:
  i have 3 fields that I am working with: genre, genre_search and text.
 genre
  is a string field which comes from the data source. genre_search is a
 text
  field that is copied from genre, and text is a text field that is copied
  from genre_search and a few other fields. Text field is the default
 search
  field for queries. When I search for q=genre_search:indie+rock, solr
 returns
  several records that have both Indie as a genre and Rock as a genre,
 which
  is great, but when I search for q=indie+rock or q=text:indie+rock, i get
 no
  results.
 
  Why would the source field return the value and the destination wouldn't.
  Both genre_search and text are the same data type, so there shouldn't be
 any
  strange translations happening.
 



Re: OOM errors and -XX:OnOutOfMemoryError flag not working on solr?

2011-09-21 Thread Pulkit Singhal
Usually any good piece of java code refrains from capturing Throwable
so that Errors will bubble up unlike exceptions. Having said that,
perhaps someone in the list can help, if you share which particular
Solr version you are using where you suspect that the Error is being
eaten up.

On Fri, Sep 16, 2011 at 2:47 PM, Jason Toy jason...@gmail.com wrote:
 I have solr issues where I keep running out of memory. I am working on
 solving the memory issues (this will take a long time), but in the meantime,
 I'm trying to be notified when the error occurs.  I saw with the jvm I can
 pass the -XX:OnOutOfMemoryError= flag and pass a script to run. Every time
 the out of memory issue occurs though my script never runs. Does solr let
 the error bubble up so that the jvm can call this script? If not how can I
 have a script run when solr gets an out of memory issue?



Re: Solr Indexing - Null Values in date field

2011-09-21 Thread Pulkit Singhal
Also you may use the script transformer to explicitly remove the field
from the document if the field is null. I do this for all my sdouble
and sdate fields ... its a bit manual and I would like to see Solr
enhanced to simply skip stuff like this by having a flag for its DIH
code but until then it suffices:

... transformer=DateFormatTransformer,script:skipEmptyFields

  script
![CDATA[
function skipEmptyFields(row) {
var regularPrice = row.get( 'regularPrice' );
if ( regularPrice == null || regularPrice == '' ) {
row.remove( 'regularPrice' );
}
var salePrice = row.get( 'salePrice' );
if ( salePrice == null || salePrice == '' ) {
row.remove( 'salePrice' );
}
return row;
}
]]
  /script



On Wed, Sep 21, 2011 at 6:06 AM, Gora Mohanty g...@mimirtech.com wrote:
 On Wed, Sep 21, 2011 at 4:08 PM, mechravi25 mechrav...@yahoo.co.in wrote:
 Hi,

 I have a field in my source with data type as string and that field has NULL
 values. I am trying to index this field in solr as a date data type with
 multivalued = true. Following is the entry for that field in my schema.xml
 [...]

 One cannot have NULL values as input for Solr date fields. The
 multivalued part is irrelevant here.

 As it seems like you are getting the input data from a database,
 you will need to supply some invalid date for NULL date values.
 E.g., with mysql, we have:
 COALESCE( CreationDate, STR_TO_DATE( '1970,1,1', '%Y,%m,%d' ) )
 The required syntax will be different for other databases.

 Regards,
 Gora



Debugging DIH by placing breakpoints

2011-09-21 Thread Pulkit Singhal
Hello,

I was wondering where can I find the source code for DIH? I want to
checkout the source and step-trhought it breakpoint by breakpoint to
understand it better :)

Thanks!
- Pulkit


Re: strange copied field problem

2011-09-21 Thread Tanner Postert
sure enough that worked. could have sworn we had it this way before, but
either way, that fixed it. Thanks.

On Wed, Sep 21, 2011 at 11:01 AM, Tanner Postert
tanner.post...@gmail.comwrote:

 i believe that was the original configuration, but I can switch it back and
 see if that yields any results.


 On Wed, Sep 21, 2011 at 10:54 AM, Pulkit Singhal 
 pulkitsing...@gmail.comwrote:

 I am NOT claiming that making a copy of a copy field is wrong or leads
 to a race condition. I don't know that. BUT did you try to copy into
 the text field directly from the genre field? Instead of the
 genre_search field? Did that yield working queries?

 On Wed, Sep 21, 2011 at 12:16 PM, Tanner Postert
 tanner.post...@gmail.com wrote:
  i have 3 fields that I am working with: genre, genre_search and text.
 genre
  is a string field which comes from the data source. genre_search is a
 text
  field that is copied from genre, and text is a text field that is copied
  from genre_search and a few other fields. Text field is the default
 search
  field for queries. When I search for q=genre_search:indie+rock, solr
 returns
  several records that have both Indie as a genre and Rock as a genre,
 which
  is great, but when I search for q=indie+rock or q=text:indie+rock, i get
 no
  results.
 
  Why would the source field return the value and the destination
 wouldn't.
  Both genre_search and text are the same data type, so there shouldn't be
 any
  strange translations happening.
 





Re: Debugging DIH by placing breakpoints

2011-09-21 Thread Gora Mohanty
On Thu, Sep 22, 2011 at 12:08 AM, Pulkit Singhal
pulkitsing...@gmail.com wrote:
 Hello,

 I was wondering where can I find the source code for DIH? I want to
 checkout the source and step-trhought it breakpoint by breakpoint to
 understand it better :)

Should be under contrib/dataimporthandler in your Solr source
tree.

Regards,
Gora


Re: Debugging DIH by placing breakpoints

2011-09-21 Thread Pulkit Singhal
Correct! With that additional info, plus
http://wiki.apache.org/solr/HowToContribute (ant eclipse), plus a
refreshed (close/open) eclipse project ... I'm all set.

Thanks Again.

On Wed, Sep 21, 2011 at 1:43 PM, Gora Mohanty g...@mimirtech.com wrote:
 On Thu, Sep 22, 2011 at 12:08 AM, Pulkit Singhal
 pulkitsing...@gmail.com wrote:
 Hello,

 I was wondering where can I find the source code for DIH? I want to
 checkout the source and step-trhought it breakpoint by breakpoint to
 understand it better :)

 Should be under contrib/dataimporthandler in your Solr source
 tree.

 Regards,
 Gora



Production Issue: SolrJ client throwing this error even though field type is not defined in schema

2011-09-21 Thread roz dev
Hi All

We are getting this error in our Production Solr Setup.

Message: Element type t_sort must be followed by either attribute
specifications,  or /.
Solr version is 1.4.1

Stack trace indicates that solr is returning malformed document.


Caused by: org.apache.solr.client.solrj.SolrServerException: Error
executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at 
com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232)
... 15 more
Caused by: org.apache.solr.common.SolrException: parsing error
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
... 17 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at
[row,col]:[3,136974]
Message: Element type t_sort must be followed by either attribute
specifications,  or /.
at 
com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125)
... 21 more


Re: strange copied field problem

2011-09-21 Thread Pulkit Singhal
No probs. I would still hope someone would comment on you thread with
some expert opinions about making a copy of a copy :)

On Wed, Sep 21, 2011 at 1:38 PM, Tanner Postert
tanner.post...@gmail.com wrote:
 sure enough that worked. could have sworn we had it this way before, but
 either way, that fixed it. Thanks.

 On Wed, Sep 21, 2011 at 11:01 AM, Tanner Postert
 tanner.post...@gmail.comwrote:

 i believe that was the original configuration, but I can switch it back and
 see if that yields any results.


 On Wed, Sep 21, 2011 at 10:54 AM, Pulkit Singhal 
 pulkitsing...@gmail.comwrote:

 I am NOT claiming that making a copy of a copy field is wrong or leads
 to a race condition. I don't know that. BUT did you try to copy into
 the text field directly from the genre field? Instead of the
 genre_search field? Did that yield working queries?

 On Wed, Sep 21, 2011 at 12:16 PM, Tanner Postert
 tanner.post...@gmail.com wrote:
  i have 3 fields that I am working with: genre, genre_search and text.
 genre
  is a string field which comes from the data source. genre_search is a
 text
  field that is copied from genre, and text is a text field that is copied
  from genre_search and a few other fields. Text field is the default
 search
  field for queries. When I search for q=genre_search:indie+rock, solr
 returns
  several records that have both Indie as a genre and Rock as a genre,
 which
  is great, but when I search for q=indie+rock or q=text:indie+rock, i get
 no
  results.
 
  Why would the source field return the value and the destination
 wouldn't.
  Both genre_search and text are the same data type, so there shouldn't be
 any
  strange translations happening.
 






Re: Sort five random Top Offers to the top

2011-09-21 Thread Sujit Pal
Hi MOuli,

AFAIK (and I don't know that much about Solr), this feature does not
exist out of the box in Solr. One way to achieve this could be to
construct a DocSet with topoffer:true and intersect it with your result
DocSet, then select the first 5 off the intersection, randomly shuffle
them, sublist [0:5], and move the sublist to the top of the results like
QueryElevationComponent does. Actually you may want to take a look at
QueryElevationComponent code for inspiration (this is where I would have
looked if I had to implement something similar).

-sujit
 
On Wed, 2011-09-21 at 06:54 -0700, MOuli wrote:
 Hey Community.
 
 I got a Lucene/Solr Index with many offers. Some of them are marked by a
 flag field topoffer that they are top offers. Now I want so sort randomly
 5 of this offers on the top.
 
 For Example
 HTC Sensation
  - topoffer = true
 HTC Desire
  - topoffer = false
 Samsung Galaxy S2
  - topoffer = ture
 IPhone 4
  - topoffer = true 
 ...
 
 When i search for a Handy then i want that first 3 offers are HTC Sensation,
 Samsung Galaxy S2 and the iPhone 4.
 
 
 Does anyone have an idea?
 
 PS.: I hope my english is not to bad 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3355469.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Implementing a custom ResourceLoader

2011-09-21 Thread Jithin Emmanuel
Hi,
As part of writing a solr plugin I need to override the ResourceLoader. My
plugin is intended stop word analyzer filter factory and I need to change
the way stop words are being fetched. My assumption is overriding
ResourceLoader-getLines() will help me to meet my target of fetching stop
word data from an external webservice.
Is thisi feasible? Or should I go about overriding
Factory-inform(ResourceLoader) method. Kindly let me know how to achieve
this.

-- Thanks
Jithin


Re: Two unrelated questions

2011-09-21 Thread Erick Erickson
for 1 I don't quite get what you're driving at. Your DIH
query assigns the uniqueKey, it's not like it's something
auto-generated. Perhaps a concrete example would
help.

2 There's a limit you can adjust that defaults to
1024 (maxBooleanClauses in solrconfig.xml). You can
 bump this very high, but you're right, if anyone actually
does something absurd it'll slow *that* query down. But
just bumping this query higher won't change performance
absent someone actually putting a ton of items in it...

Best
Erick

On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron rol...@lbpc.com wrote:
 Hi all-

 I'm not sure if I should break this out into two separate questions to the 
 list for searching purposes, or if one is more acceptable (don't want to 
 flood).

 I have two (hopefully) straightforward questions:

 1. Is it possible to expose the unique ID of a document to a DIH query? The 
 reason I want to do this is because I use the unique ID of the row in the 
 table as the unique ID of the Lucene document, but I've noticed that the 
 counts of documents doesn't match the count in the table; I'd like to add 
 these rows and was hoping to avoid writing a custom SolrJ app to do it.

 2. Is there any limit to the number of conditions in a Boolean search? We're 
 working on a new project where the user can choose either, for example, Ford 
 Vehicles, in which case I can simply search for Ford, but if the user 
 chooses specific makes and models, then I have to say something like Crown 
 Vic OR Focus OR Taurus OR F-150, etc., where they could theoretically choose 
 every model of Ford ever made except one. This could lead to a *very* large 
 query, and was worried both that it was even possible, but also the impact on 
 performance.


 Thanks, and I apologize if this really should be two separate messages.

 Ron

 DISCLAIMER: This electronic message, including any attachments, files or 
 documents, is intended only for the addressee and may contain CONFIDENTIAL, 
 PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
 recipient, you are hereby notified that any use, disclosure, copying or 
 distribution of this message or any of the information included in or with it 
 is  unauthorized and strictly prohibited.  If you have received this message 
 in error, please notify the sender immediately by reply e-mail and 
 permanently delete and destroy this message and its attachments, along with 
 any copies thereof. This message does not create any contractual obligation 
 on behalf of the sender or Law Bulletin Publishing Company.
 Thank you.



Re: Slow autocomplete(terms)

2011-09-21 Thread Erick Erickson
Think about ngrams if you really need infix searches,
you're right that the regex is very probably the
root of your problem. The index has to examine
*every* term in the field to determine if the regex
will match.

Best
Erick

On Tue, Sep 20, 2011 at 12:57 AM, roySolr royrutten1...@gmail.com wrote:
 Hello,

 I used the terms request for autocomplete. It works fine with 200.000
 records but with 2 million docs it's very slow..

 I use some regex to fix autocomplete in the middle of words, example: chest
 - manchester.

 My call(pecl PHP solr):

 $query = new SolrQuery();
 $query-setTermsLimit(10);

 $query-setTerms(true);
 $query-setTermsField($field);

 $term = SolrUtils::escapeQueryChars ($term);
 $query-set(terms.regex,(.*)$term(.*));
 $query-set(terms.regex.flag,case_insensitive);

 URL:
 /solr/terms?terms.fl=autocompletewhatterms.regex=(.*)chest(.*)terms.regex.flag=case_insensitiveterms=true

 I think the regex is the reason for the very high query time: Solr search
 between 2 million docs with a regex. The query takes 2 seconds, this is to
 much for the autocomplete. A user typed manchester united and solr needs
 to do 16 query's from 2 seconds. Are there some other options? Faster
 solutions?

 I use solr 3.1

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Slow-autocomplete-terms-tp3351352p3351352.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Chris Hostetter

: With no mergeFactor defined, maxMergeAtOnce and segmentsPerTier seem to be
: ignored.  I've got both set to 35, but Solr is merging every 10 segments.  I
...
: Here's the relevant config pieces.  These two sections are in separate files
: incorporated into solrconfig.xml using xinclude:
: 
: indexDefaults
...

do you have a mainIndex section with mergeFactor defined there?

-Hoss


RE: Two unrelated questions

2011-09-21 Thread Olson, Ron
Thanks for the reply. As far as #1, my table that I'm indexing via DIH has a PK 
field, generated by a sequence, so there are records with ID of 1, 2, 3, etc. 
That same id is the one I use in my unique id field in the document 
(uniqueKeyID/uniqueID).

I've noticed that the table has, say, 10 rows. My index only has 8. I don't 
know why that is, but I'd like to figure out which records are missing and add 
them (and hopefully understand why they weren't added in the first place). I 
was just wondering if there was some way to compare the two as part of a sql 
query, but on reflection, it does seem like an absurd request, so I apologize; 
I think what I'll have to do is write a solrj program that gets every ID in the 
table, then does a search on that ID in the index, and add the ones that are 
missing.

Regarding the second item, yes, it's crazy but I'm not sure what to do; there 
really are that many options and some searches will be extremely specific, yet 
broad enough in terms for this to be a problem.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, September 21, 2011 3:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Two unrelated questions

for 1 I don't quite get what you're driving at. Your DIH
query assigns the uniqueKey, it's not like it's something
auto-generated. Perhaps a concrete example would
help.

2 There's a limit you can adjust that defaults to
1024 (maxBooleanClauses in solrconfig.xml). You can
 bump this very high, but you're right, if anyone actually
does something absurd it'll slow *that* query down. But
just bumping this query higher won't change performance
absent someone actually putting a ton of items in it...

Best
Erick

On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron rol...@lbpc.com wrote:
 Hi all-

 I'm not sure if I should break this out into two separate questions to the 
 list for searching purposes, or if one is more acceptable (don't want to 
 flood).

 I have two (hopefully) straightforward questions:

 1. Is it possible to expose the unique ID of a document to a DIH query? The 
 reason I want to do this is because I use the unique ID of the row in the 
 table as the unique ID of the Lucene document, but I've noticed that the 
 counts of documents doesn't match the count in the table; I'd like to add 
 these rows and was hoping to avoid writing a custom SolrJ app to do it.

 2. Is there any limit to the number of conditions in a Boolean search? We're 
 working on a new project where the user can choose either, for example, Ford 
 Vehicles, in which case I can simply search for Ford, but if the user 
 chooses specific makes and models, then I have to say something like Crown 
 Vic OR Focus OR Taurus OR F-150, etc., where they could theoretically choose 
 every model of Ford ever made except one. This could lead to a *very* large 
 query, and was worried both that it was even possible, but also the impact on 
 performance.


 Thanks, and I apologize if this really should be two separate messages.

 Ron

 DISCLAIMER: This electronic message, including any attachments, files or 
 documents, is intended only for the addressee and may contain CONFIDENTIAL, 
 PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
 recipient, you are hereby notified that any use, disclosure, copying or 
 distribution of this message or any of the information included in or with it 
 is  unauthorized and strictly prohibited.  If you have received this message 
 in error, please notify the sender immediately by reply e-mail and 
 permanently delete and destroy this message and its attachments, along with 
 any copies thereof. This message does not create any contractual obligation 
 on behalf of the sender or Law Bulletin Publishing Company.
 Thank you.



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: Two unrelated questions

2011-09-21 Thread Rob Casson
for #1, i don't use DIH, but is there any possibility of that column
having duplicate keys, with subsequent docs replacing existing ones?

and for #2, for some cases you could use a negative filterquery:

 
http://wiki.apache.org/solr/SimpleFacetParameters#Retrieve_docs_with_facets_missing

so instead of that fq=-facetField:[* TO *], something like
fq=-car_make:Taurus.  picking negatives might even make the UI a
bit easier.

anyway, just some thoughts.  cheers,
rob

On Wed, Sep 21, 2011 at 5:17 PM, Olson, Ron rol...@lbpc.com wrote:
 Thanks for the reply. As far as #1, my table that I'm indexing via DIH has a 
 PK field, generated by a sequence, so there are records with ID of 1, 2, 3, 
 etc. That same id is the one I use in my unique id field in the document 
 (uniqueKeyID/uniqueID).

 I've noticed that the table has, say, 10 rows. My index only has 8. I don't 
 know why that is, but I'd like to figure out which records are missing and 
 add them (and hopefully understand why they weren't added in the first 
 place). I was just wondering if there was some way to compare the two as part 
 of a sql query, but on reflection, it does seem like an absurd request, so I 
 apologize; I think what I'll have to do is write a solrj program that gets 
 every ID in the table, then does a search on that ID in the index, and add 
 the ones that are missing.

 Regarding the second item, yes, it's crazy but I'm not sure what to do; there 
 really are that many options and some searches will be extremely specific, 
 yet broad enough in terms for this to be a problem.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, September 21, 2011 3:55 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Two unrelated questions

 for 1 I don't quite get what you're driving at. Your DIH
 query assigns the uniqueKey, it's not like it's something
 auto-generated. Perhaps a concrete example would
 help.

 2 There's a limit you can adjust that defaults to
 1024 (maxBooleanClauses in solrconfig.xml). You can
  bump this very high, but you're right, if anyone actually
 does something absurd it'll slow *that* query down. But
 just bumping this query higher won't change performance
 absent someone actually putting a ton of items in it...

 Best
 Erick

 On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron rol...@lbpc.com wrote:
 Hi all-

 I'm not sure if I should break this out into two separate questions to the 
 list for searching purposes, or if one is more acceptable (don't want to 
 flood).

 I have two (hopefully) straightforward questions:

 1. Is it possible to expose the unique ID of a document to a DIH query? The 
 reason I want to do this is because I use the unique ID of the row in the 
 table as the unique ID of the Lucene document, but I've noticed that the 
 counts of documents doesn't match the count in the table; I'd like to add 
 these rows and was hoping to avoid writing a custom SolrJ app to do it.

 2. Is there any limit to the number of conditions in a Boolean search? We're 
 working on a new project where the user can choose either, for example, 
 Ford Vehicles, in which case I can simply search for Ford, but if the 
 user chooses specific makes and models, then I have to say something like 
 Crown Vic OR Focus OR Taurus OR F-150, etc., where they could 
 theoretically choose every model of Ford ever made except one. This could 
 lead to a *very* large query, and was worried both that it was even 
 possible, but also the impact on performance.


 Thanks, and I apologize if this really should be two separate messages.

 Ron

 DISCLAIMER: This electronic message, including any attachments, files or 
 documents, is intended only for the addressee and may contain CONFIDENTIAL, 
 PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
 recipient, you are hereby notified that any use, disclosure, copying or 
 distribution of this message or any of the information included in or with 
 it is  unauthorized and strictly prohibited.  If you have received this 
 message in error, please notify the sender immediately by reply e-mail and 
 permanently delete and destroy this message and its attachments, along with 
 any copies thereof. This message does not create any contractual obligation 
 on behalf of the sender or Law Bulletin Publishing Company.
 Thank you.



 DISCLAIMER: This electronic message, including any attachments, files or 
 documents, is intended only for the addressee and may contain CONFIDENTIAL, 
 PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
 recipient, you are hereby notified that any use, disclosure, copying or 
 distribution of this message or any of the information included in or with it 
 is  unauthorized and strictly prohibited.  If you have received this message 
 in error, please notify the sender immediately by reply e-mail and 
 permanently delete and destroy this message and its attachments, along with 
 any 

Re: OOM errors and -XX:OnOutOfMemoryError flag not working on solr?

2011-09-21 Thread Chris Hostetter

: Usually any good piece of java code refrains from capturing Throwable
: so that Errors will bubble up unlike exceptions. Having said that,

Even if some piece of code catches an OutOfMemoryError, the JVM should 
have already called the -XX:OnOutOfMemoryError hook - Although from what 
i can tell, the JVM will only call the hook on hte *first* OOM thrown

(you can try the code below to test this behavior in your own JVM) 

:  I'm trying to be notified when the error occurs.  I saw with the jvm I can
:  pass the -XX:OnOutOfMemoryError= flag and pass a script to run. Every time
:  the out of memory issue occurs though my script never runs. Does solr let

...exactly what JVM are you running?  this option is specific to the 
Sun/Oracle JVM.  For example, in the IBM JVM, there is a completley 
different mechanism...

http://stackoverflow.com/questions/3467219/is-there-something-like-xxonerror-or-xxonoutofmemoryerror-in-ibm-jvm


-- Simple OnOutOfMemoryError hook test -
import static java.lang.System.out;
import java.util.ArrayList;
public final class Test {
  public static void main(String... args) throws Exception {
ArrayList data = new ArrayListObject(1000);
for (int i=0; i5; i++) {
  try {
while (i  5) {
  data.add(new ArrayListInteger(10));
}
  } catch (OutOfMemoryError oom) {
data.clear();
out.println(caught);
  }
}
  }
}
-- example of running it ---
hossman@bester:~/tmp$ java -version
java version 1.6.0_24
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
hossman@bester:~/tmp$ java -XX:OnOutOfMemoryError=echo HOOK -Xmx64M Test 
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError=echo HOOK
#   Executing /bin/sh -c echo HOOK...
HOOK
caught
caught
caught
caught
caught
hossman@bester:~/tmp$ 
--



-Hoss

Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Shawn Heisey

On 9/21/2011 3:10 PM, Chris Hostetter wrote:

: With no mergeFactor defined, maxMergeAtOnce and segmentsPerTier seem to be
: ignored.  I've got both set to 35, but Solr is merging every 10 segments.  I
...
: Here's the relevant config pieces.  These two sections are in separate files
: incorporated into solrconfig.xml using xinclude:
:
:indexDefaults
...

do you have a mainIndex section with mergeFactor defined there?


The mergeFactor section is in my config, but it's commented out.  I left 
out the commented sections when I included it before.  It doesn't appear 
anywhere else.  Here's the full config snippet with comments:


indexDefaults
useCompoundFilefalse/useCompoundFile
!--
mergeFactor35/mergeFactor
--
mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
int name=maxMergeCount4/int
int name=maxThreadCount4/int
/mergeScheduler
!--
termIndexInterval64/termIndexInterval
--
ramBufferSizeMB96/ramBufferSizeMB
maxFieldLength32768/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout
lockTypenative/lockType
/indexDefaults

Here's the mainIndex section:

mainIndex
unlockOnStartuptrue/unlockOnStartup
reopenReaderstrue/reopenReaders
deletionPolicy class=solr.SolrDeletionPolicy
str name=maxCommitsToKeep1/str
str name=maxOptimizedCommitsToKeep0/str
/deletionPolicy
infoStream file=INFOSTREAM.txtfalse/infoStream
/mainIndex

Thanks,
Shawn



SOLR error with custom FacetComponent

2011-09-21 Thread Ravi Bulusu
Hi All,


I'm trying to write a custom SOLR facet component and I'm getting some
errors when I deploy my code into the SOLR server.

Can you please let me know what Im doing wrong? I appreciate your help on
this issue. Thanks.

*Issue*

I'm getting an error saying Error instantiating SearchComponent My Custom
Class is not a org.apache.solr.handler.component.SearchComponent.

My custom class inherits from *FacetComponent* which extends from *
SearchComponent*.

My custom class is defined as follows…

I implemented the process method to meet our functionality.

We have some default facets that have to be sent every time, irrespective of
the Query request.


/**

 *

 * @author ravibulusu

 */

public class MyFacetComponent extends FacetComponent {

….

}


Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Shawn Heisey

On 9/21/2011 11:18 AM, Shawn Heisey wrote:
With no mergeFactor defined, maxMergeAtOnce and segmentsPerTier seem 
to be ignored.  I've got both set to 35, but Solr is merging every 10 
segments.  I haven't tried explicitly setting mergeFactor yet to see 
if that will make the other settings override it, I'm letting the 
current import finish first.


I have tried again with mergeFactor set to 8 and the other settings in 
mergePolicy remaining at 35.  It merged after every 8th segment.  This 
is on lucene_solr_3_4 checked out from SVN, with SOLR-1972 manually 
applied.  Settings used this time:


indexDefaults
useCompoundFilefalse/useCompoundFile
mergeFactor8/mergeFactor
mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
int name=maxMergeCount4/int
int name=maxThreadCount4/int
/mergeScheduler
!--
termIndexInterval64/termIndexInterval
--
ramBufferSizeMB96/ramBufferSizeMB
maxFieldLength32768/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout
lockTypenative/lockType
/indexDefaults

mergePolicy class=org.apache.lucene.index.TieredMergePolicy
int name=maxMergeAtOnce35/int
int name=segmentsPerTier35/int
int name=maxMergeAtOnceExplicit105/int
/mergePolicy

If there's anything else you'd like me to do, please let me know and 
I'll get to it as soon as I can.


Thanks,
Shawn



Re: SOLR error with custom FacetComponent

2011-09-21 Thread Erik Hatcher
Why create a custom facet component for this?

Simply add lines like this to your request handler(s):

str name=facet.fieldmanu_exact/str

either in defaults or appends sections.

Erik



On Sep 21, 2011, at 14:00 , Ravi Bulusu wrote:

 Hi All,
 
 
 I'm trying to write a custom SOLR facet component and I'm getting some
 errors when I deploy my code into the SOLR server.
 
 Can you please let me know what Im doing wrong? I appreciate your help on
 this issue. Thanks.
 
 *Issue*
 
 I'm getting an error saying Error instantiating SearchComponent My Custom
 Class is not a org.apache.solr.handler.component.SearchComponent.
 
 My custom class inherits from *FacetComponent* which extends from *
 SearchComponent*.
 
 My custom class is defined as follows…
 
 I implemented the process method to meet our functionality.
 
 We have some default facets that have to be sent every time, irrespective of
 the Query request.
 
 
 /**
 
 *
 
 * @author ravibulusu
 
 */
 
 public class MyFacetComponent extends FacetComponent {
 
 ….
 
 }



RE: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Michael Ryan
I think the problem is that the mergePolicy config needs to be inside of the
indexDefaults config, rather than after it as your have.

-Michael


Re: OOM errors and -XX:OnOutOfMemoryError flag not working on solr?

2011-09-21 Thread Jason Toy
I am running the sun version:
java version 1.6.0_26
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

I get multiple Out of memory exceptions looking at my application and the
solr logs, but my script doesn't get called the first time or other times,
hence why I was thinking that maybe solr is doing something different.  My
script notifies me  of the memory exception and then restarts the jvm.
 Running the script manually works fine. I'll try to do some more testing to
see what exactly is going on.

Jason

On Wed, Sep 21, 2011 at 2:31 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Usually any good piece of java code refrains from capturing Throwable
 : so that Errors will bubble up unlike exceptions. Having said that,

 Even if some piece of code catches an OutOfMemoryError, the JVM should
 have already called the -XX:OnOutOfMemoryError hook - Although from what
 i can tell, the JVM will only call the hook on hte *first* OOM thrown

 (you can try the code below to test this behavior in your own JVM)

 :  I'm trying to be notified when the error occurs.  I saw with the jvm I
 can
 :  pass the -XX:OnOutOfMemoryError= flag and pass a script to run. Every
 time
 :  the out of memory issue occurs though my script never runs. Does solr
 let

 ...exactly what JVM are you running?  this option is specific to the
 Sun/Oracle JVM.  For example, in the IBM JVM, there is a completley
 different mechanism...


 http://stackoverflow.com/questions/3467219/is-there-something-like-xxonerror-or-xxonoutofmemoryerror-in-ibm-jvm


 -- Simple OnOutOfMemoryError hook test -
 import static java.lang.System.out;
 import java.util.ArrayList;
 public final class Test {
  public static void main(String... args) throws Exception {
ArrayList data = new ArrayListObject(1000);
for (int i=0; i5; i++) {
  try {
while (i  5) {
  data.add(new ArrayListInteger(10));
}
  } catch (OutOfMemoryError oom) {
data.clear();
out.println(caught);
  }
}
  }
 }
 -- example of running it ---
 hossman@bester:~/tmp$ java -version
 java version 1.6.0_24
 Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
 Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
 hossman@bester:~/tmp$ java -XX:OnOutOfMemoryError=echo HOOK -Xmx64M Test
 #
 # java.lang.OutOfMemoryError: Java heap space
 # -XX:OnOutOfMemoryError=echo HOOK
 #   Executing /bin/sh -c echo HOOK...
 HOOK
 caught
 caught
 caught
 caught
 caught
 hossman@bester:~/tmp$
 --



 -Hoss




-- 
- sent from my mobile
6176064373


Re: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Michael Sokolov
I wonder if config-file validation would be helpful here :) I posted a 
patch in SOLR-1758 once.


-Mike

On 9/21/2011 6:22 PM, Michael Ryan wrote:

I think the problem is that themergePolicy  config needs to be inside of the
indexDefaults  config, rather than after it as your have.

-Michael




RE: NRT and commit behavior

2011-09-21 Thread Tirthankar Chatterjee
Okay, but is there any number that if we reach on the index size or total docs 
in the index or the size of physical memory that sharding should be considered. 

I am trying to find the winning combination.
Tirthankar
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, September 16, 2011 7:46 AM
To: solr-user@lucene.apache.org
Subject: Re: NRT and commit behavior

Uhm, you're putting  a lot of index into not very much memory. I really think 
you're going to have to shard your index across several machines to get past 
this problem. Simply increasing the size of your caches is still limited by the 
physical memory you're working with.

You really have to put a profiler on the system to see what's going on. At that 
size there are too many things that it *could* be to definitively answer it 
with e-mails

Best
Erick

On Wed, Sep 14, 2011 at 7:35 AM, Tirthankar Chatterjee 
tchatter...@commvault.com wrote:
 Erick,
 Also, we had  our solrconfig where we have tried increasing the cache 
 making the below value for autowarm count as 0 helps returning the commit 
 call within the second, but that will slow us down on searches

 filterCache
      class=solr.FastLRUCache
      size=16384
      initialSize=4096
      autowarmCount=4096/

    !-- Cache used to hold field values that are quickly accessible
         by document id.  The fieldValueCache is created by default
         even if not configured here.
      fieldValueCache
        class=solr.FastLRUCache
        size=512
        autowarmCount=128
        showItems=32
      /
    --

   !-- queryResultCache caches results of searches - ordered lists of
         document ids (DocList) based on a query, a sort, and the range
         of documents requested.  --
    queryResultCache
      class=solr.LRUCache
      size=16384
      initialSize=4096
      autowarmCount=4096/

  !-- documentCache caches Lucene Document objects (the stored fields for 
 each document).
       Since Lucene internal document ids are transient, this cache 
 will not be autowarmed.  --
    documentCache
      class=solr.LRUCache
      size=512
      initialSize=512
      autowarmCount=512/

 -Original Message-
 From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com]
 Sent: Wednesday, September 14, 2011 7:31 AM
 To: solr-user@lucene.apache.org
 Subject: RE: NRT and commit behavior

 Erick,
 Here is the answer to your questions:
 Our index is 267 GB
 We are not optimizing...
 No we have not profiled yet to check the bottleneck, but logs indicate 
 opening the searchers is taking time...
 Nothing except SOLR
 Total memory is 16GB tomcat has 8GB allocated Everything 64 bit OS and 
 JVM and Tomcat

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Sunday, September 11, 2011 11:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: NRT and commit behavior

 Hmm, OK. You might want to look at the non-cached filter query stuff, it's 
 quite recent.
 The point here is that it is a filter that is applied only after all of the 
 less expensive filter queries are run, One of its uses is exactly ACL 
 calculations. Rather than calculate the ACL for the entire doc set, it only 
 calculates access for docs that have made it past all the other elements of 
 the query See SOLR-2429 and note that it is a 3.4 (currently being 
 released) only.

 As to why your commits are taking so long, I have no idea given that you 
 really haven't given us much to work with.

 How big is your index? Are you optimizing? Have you profiled the application 
 to see what the bottleneck is (I/O, CPU, etc?). What else is running on your 
 machine? It's quite surprising that it takes that long. How much memory are 
 you giving the JVM? etc...

 You might want to review: 
 http://wiki.apache.org/solr/UsingMailingLists

 Best
 Erick


 On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee 
 tchatter...@commvault.com wrote:
 Erick,
 What you said is correct for us the searches are based on some Active 
 Directory permissions which are populated in Filter query parameter. So we 
 don't have any warming query concept as we cannot fire for every user ahead 
 of time.

 What we do here is that when user logs in we do an invalid query(which 
 return no results instead of '*') with the correct filter query (which is 
 his permissions based on the login). This way the cache gets warmed up with 
 valid docs.

 It works then.


 Also, can you please let me know why commit is taking 45 mins to 1 hours on 
 a good resourced hardware with multiple processors and 16gb RAM 64 bit VM, 
 etc. We tried passing waitSearcher as false and found that inside the code 
 it hard coded to be true. Is there any specific reason. Can we change that 
 value to honor what is being passed.

 Thanks,
 Tirthankar

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Thursday, September 01, 2011 8:38 AM
 To: 

Re: Two unrelated questions

2011-09-21 Thread tamanjit.bin...@yahoo.co.in
For *1* I have faced similar issues, and have realized that it has got more
to do with the data I am trying to index. In some cases when I run even a
full-import with DIH, unless its a flat table that I am tryin to index,
there are often issues at data end when I try to get joins and then index
data.

Am not too sure if you are joining two tables. If not I would suggest that
you re-check your data and then re-index using full-import.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Two-unrelated-questions-tp3348991p3357720.html
Sent from the Solr - User mailing list archive at Nabble.com.