solr indexing on HDFS for high query throughput

2012-07-18 Thread vineet yadav
Hi,
I am using solr for indexing. Index size is small and it is around
50GB. I need to use solr for high query throughput system. I am using
twitter api  and I need to search incoming tweet in solr. So I want to
know how should I design such system ? Does solr supports HDFS
natively ? How can I index and search on HDFS system ?
Thanks
Vineet Yadav


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-18 Thread Yonik Seeley
I think what makes the most sense is to limit the number of
connections to another host.  A host only has so many CPU resources,
and beyond a certain point throughput would start to suffer anyway
(and then only make the problem worse).  It also makes sense in that a
client could generate documents faster than we can index them (either
for a short period of time, or on average) and having flow control to
prevent unlimited buffering (which is essentially what this is) makes
sense.

Nick - when you switched to HttpSolrServer, things worked because this
added an explicit flow control mechanism.
A single request (i.e. an add with one or more documents) is fully
indexed to all endpoints before the response is returned.  Hence if
you have 10 indexing threads and are adding documents in batches of
100, there can be only 1000 documents buffered in the system at any
one time.

-Yonik
http://lucidimagination.com


Start solr master and solr slave with enable replication = false

2012-07-18 Thread Jamel ESSOUSSI
Hi,

It's possible to start the solr master and slave with the following
configuration

- replication on master disabled when we start solr -- the replication
feature must be available 
- polling on slave disabled -- the replication feature must be available 


-- Best Regards
-- Jamel

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Start-solr-master-and-solr-slave-with-enable-replication-false-tp3995685.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problems with elevation component configuration

2012-07-18 Thread igors
Hi,

Well, if I understand correctly, only the search term is important for
elevation, not the query.

Anyway, we ended up modifying QueryElevationComponent class, extracting the
search term from the query using regex.
After that, it turned out that elevation doesn't work with grouped results,
so we had to separate sorting for groups and non-groups in prepare() method
of the same class.
That was not the end of problems, because we need to show elevated results
with a different styling, so we upgraded to Solr4 and now it seems to be
working as expected.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problems-with-elevation-component-configuration-tp3993204p3995692.html
Sent from the Solr - User mailing list archive at Nabble.com.


change of API Javadoc interface funtionality in 4.0.x

2012-07-18 Thread Bernd Fehling
Dear developers,

while upgrading from 3.6.x to 4.x I have to rewrite some of my code and
search for the new methods and/or classes. In 3.6.x and older versions
the API Javadoc interface had an Index which made it easy to find the
appropriate methods. The button to call the Index was located in the
top of the web page between Deprecated and Help.

What is the sense of removing the Index from the API Javadoc for Lucene and 
Solr?

Regards
Bernd


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-18 Thread solrman
Nick,

to solve out of memory issue, i think you can make below change:
1) in solrsconfig.xml, reduce ramBufferSizeMB (there are two, change both)
2) in solrsconfig.xml, reduce documentCache value

to solve call commit slow down index issue, i think you can change new
search default queyr:
in solrsconfig.xml, search for 
listener event=newSearcher class=solr.QuerySenderListener
change 
str name=qcontent:*/str str name=start0/str str
name=rows10/str
to
str name=qcontent:notexist/str str name=start0/str str
name=rows10/str

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-Alpha-Out-Of-Mem-Err-tp3995033p3995695.html
Sent from the Solr - User mailing list archive at Nabble.com.


Does SolrEntityProcessor fulfill my requirements?

2012-07-18 Thread Vadim Kisselmann
Hi folks,

i have this case:
i want to update my solr 4.0 from trunk to solr 4.0 alpha. the index
structure has changed, i can't replicate.
10 cores are in use, each with 30Mio docs. We assume that all fields
are stored and indexed.
What is the best way to export the docs from all cores on one machine
with solr 4.0trunk to same named cores on other machine with solr 4.0
alpha.
SolrEntityProcessor can be one solution, but does it work with this
size of data? I want reindex all docs at once and not in small
parts. I find no examples
of bigger reindex-attempts with SolrEntityProcessor.
Xslt as option two?
What were the best solution to do this, what do you think?

Best Regards
Vadim


Re: Wildcard query vs facet.prefix for autocomplete?

2012-07-18 Thread Erick Erickson
Well, option 2 won't do you any good, so speed doesn't really matter.
Your response would have a facet count for dam, all by itself, something like

int name=damned2/int
int name=dame1/int

etc.

which does not contain anything that lets you reconstruct the title
for autosuggest.

Best
Erick

On Tue, Jul 17, 2012 at 3:18 AM, santamaria2 aravinda@contify.com wrote:
 I'll consider using the other methods, but I'd like to know which would be
 faster among the two approaches mentioned in my opening post.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995458.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Wildcard query vs facet.prefix for autocomplete?

2012-07-18 Thread santamaria2
Well silly me... you're right.

On Wed, Jul 18, 2012 at 6:44 PM, Erick Erickson [via Lucene] 
ml-node+s472066n399570...@n3.nabble.com wrote:

 Well, option 2 won't do you any good, so speed doesn't really matter.
 Your response would have a facet count for dam, all by itself, something
 like

 int name=damned2/int
 int name=dame1/int

 etc.

 which does not contain anything that lets you reconstruct the title
 for autosuggest.

 Best
 Erick

 On Tue, Jul 17, 2012 at 3:18 AM, santamaria2 [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3995706i=0
 wrote:
  I'll consider using the other methods, but I'd like to know which would
 be
  faster among the two approaches mentioned in my opening post.
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995458.html

  Sent from the Solr - User mailing list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995706.html
  To unsubscribe from Wildcard query vs facet.prefix for autocomplete?, click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995199code=YXJhdmluZGEucmFvQGNvbnRpZnkuY29tfDM5OTUxOTl8MTgyMTM4MDg2OQ==
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995707.html
Sent from the Solr - User mailing list archive at Nabble.com.

NGram for misspelt words

2012-07-18 Thread Husain, Yavar



I have configured NGram Indexing for some fields.

Say I search for the city Ludlow, I get the results (normal search)

If I search for Ludlo (with w ommitted) I get the results

If I search for Ludl (with ow ommitted) I still get the results

I know that they are all partial strings of the main string hence NGram works 
perfect.

But when I type in Ludlwo (misspelt, characters o and w interchanged) I dont 
get any results, It should ideally match Ludl and provide the results.

I am not looking for Edit distance based Spell Correctors. How can I make above 
NGram based search work?

Here is my schema.xml (NGramFieldType):

fieldType name=nGram class=solr.TextField positionIncrementGap=100 
stored=false multiValued=true

analyzer type=index

tokenizer class=solr.StandardTokenizerFactory/

!-- potentially word delimiter, synonym filter, stop words, NOT stemming --

filter class=solr.LowerCaseFilterFactory/

filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 
side=front /



/analyzer

analyzer type=query

tokenizer class=solr.StandardTokenizerFactory/

!-- potentially word delimiter, synonym filter, stop words, NOT stemming --

filter class=solr.LowerCaseFilterFactory/

/analyzer

/fieldType


/PRE
BR
**BRThis
 message may contain confidential or proprietary information intended only for 
the use of theBRaddressee(s) named above or may contain information that is 
legally privileged. If you areBRnot the intended addressee, or the person 
responsible for delivering it to the intended addressee,BRyou are hereby 
notified that reading, disseminating, distributing or copying this message is 
strictlyBRprohibited. If you have received this message by mistake, please 
immediately notify us byBRreplying to the message and delete the original 
message and any copies immediately thereafter.BR
BR
Thank you.~BR
**BR
FAFLDBR
PRE


Count is inconsistent between facet and stats

2012-07-18 Thread Yandong Yao
Hi Guys,

Steps to reproduce:

1) Download apache-solr-4.0.0-ALPHA
2) cd example;  java -jar start.jar
3) cd exampledocs;  ./post.sh *.xml
4) Use statsComponent to get the stats info for field 'popularity' based on
facet 'cat'.  And the 'count' for 'electronics' is 3
http://localhost:8983/solr/collection1/select?q=cat:electronicswt=jsonrows=0stats=truestats.field=popularitystats.facet=cat

{

   - stats_fields:
   {
  - popularity:
  {
 - min: 0,
 - max: 10,
 - count: 14,
 - missing: 0,
 - sum: 75,
 - sumOfSquares: 503,
 - mean: 5.357142857142857,
 - stddev: 2.7902892835178013,
 - facets:
 {
- cat:
{
   - music:
   {
  - min: 10,
  - max: 10,
  - count: 1,
  - missing: 0,
  - sum: 10,
  - sumOfSquares: 100,
  - mean: 10,
  - stddev: 0
  },
   - monitor:
   {
  - min: 6,
  - max: 6,
  - count: 2,
  - missing: 0,
  - sum: 12,
  - sumOfSquares: 72,
  - mean: 6,
  - stddev: 0
  },
   - hard drive:
   {
  - min: 6,
  - max: 6,
  - count: 2,
  - missing: 0,
  - sum: 12,
  - sumOfSquares: 72,
  - mean: 6,
  - stddev: 0
  },
   - scanner:
   {
  - min: 6,
  - max: 6,
  - count: 1,
  - missing: 0,
  - sum: 6,
  - sumOfSquares: 36,
  - mean: 6,
  - stddev: 0
  },
   - memory:
   {
  - min: 0,
  - max: 7,
  - count: 3,
  - missing: 0,
  - sum: 12,
  - sumOfSquares: 74,
  - mean: 4,
  - stddev: 3.605551275463989
  },
   - graphics card:
   {
  - min: 7,
  - max: 7,
  - count: 2,
  - missing: 0,
  - sum: 14,
  - sumOfSquares: 98,
  - mean: 7,
  - stddev: 0
  },
   - electronics:
   {
  - min: 1,
  - max: 7,
  - count: 3,
  - missing: 0,
  - sum: 9,
  - sumOfSquares: 51,
  - mean: 3,
  - stddev: 3.4641016151377544
  }
   }
}
 }
  }

}
5)  Facet on 'cat' and the count is 14.
http://localhost:8983/solr/collection1/select?q=cat:electronicswt=jsonrows=0facet=truefacet.field=cat

{

   - cat:
   [
  - electronics,
  - 14,
  - memory,
  - 3,
  - connector,
  - 2,
  - graphics card,
  - 2,
  - hard drive,
  - 2,
  - monitor,
  - 2,
  - camera,
  - 1,
  - copier,
  - 1,
  - multifunction printer,
  - 1,
  - music,
  - 1,
  - printer,
  - 1,
  - scanner,
  - 1,
  - currency,
  - 0,
  - search,
  - 0,
  - software,
  - 0
  ]

},



So from StatsComponent the count for 'electronics' cat is 3, while
FacetComponent report 14 'electronics'. Is this a bug?

Following is the field definition for 'cat'.
field name=cat type=string indexed=true stored=true
multiValued=true/

Thanks,
Yandong


Re: NGram for misspelt words

2012-07-18 Thread Dikchant Sahi
You are creating grams only while indexing and not querying hence 'ludlwo'
would not match. Your analyzer will create the following grams while
indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match
to 'ludlwo'.

Either you need to create gram while querying also or use Edit Distance.

On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com wrote:




 I have configured NGram Indexing for some fields.

 Say I search for the city Ludlow, I get the results (normal search)

 If I search for Ludlo (with w ommitted) I get the results

 If I search for Ludl (with ow ommitted) I still get the results

 I know that they are all partial strings of the main string hence NGram
 works perfect.

 But when I type in Ludlwo (misspelt, characters o and w interchanged) I
 dont get any results, It should ideally match Ludl and provide the
 results.

 I am not looking for Edit distance based Spell Correctors. How can I make
 above NGram based search work?

 Here is my schema.xml (NGramFieldType):

 fieldType name=nGram class=solr.TextField positionIncrementGap=100
 stored=false multiValued=true

 analyzer type=index

 tokenizer class=solr.StandardTokenizerFactory/

 !-- potentially word delimiter, synonym filter, stop words, NOT stemming
 --

 filter class=solr.LowerCaseFilterFactory/

 filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=15 side=front /



 /analyzer

 analyzer type=query

 tokenizer class=solr.StandardTokenizerFactory/

 !-- potentially word delimiter, synonym filter, stop words, NOT stemming
 --

 filter class=solr.LowerCaseFilterFactory/

 /analyzer

 /fieldType


 /PRE
 BR
 **BRThis
 message may contain confidential or proprietary information intended only
 for the use of theBRaddressee(s) named above or may contain information
 that is legally privileged. If you areBRnot the intended addressee, or
 the person responsible for delivering it to the intended addressee,BRyou
 are hereby notified that reading, disseminating, distributing or copying
 this message is strictlyBRprohibited. If you have received this message
 by mistake, please immediately notify us byBRreplying to the message and
 delete the original message and any copies immediately thereafter.BR
 BR
 Thank you.~BR

 **BR
 FAFLDBR
 PRE



RE: NGram for misspelt words

2012-07-18 Thread Husain, Yavar
Thanks Sahi. I have replaced my EdgeNGramFilterFactory to NGramFilterFactory as 
I need substrings not just in front or back but anywhere.
You are right I put the same NGramFilterFactory in both Query and Index however 
now it does not return any results not even the basic one.

-Original Message-
From: Dikchant Sahi [mailto:contacts...@gmail.com] 
Sent: Wednesday, July 18, 2012 7:54 PM
To: solr-user@lucene.apache.org
Subject: Re: NGram for misspelt words

You are creating grams only while indexing and not querying hence 'ludlwo'
would not match. Your analyzer will create the following grams while indexing 
for 'ludlow': lu lud ludl ludlo ludlow and hence would not match to 'ludlwo'.

Either you need to create gram while querying also or use Edit Distance.

On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com wrote:




 I have configured NGram Indexing for some fields.

 Say I search for the city Ludlow, I get the results (normal search)

 If I search for Ludlo (with w ommitted) I get the results

 If I search for Ludl (with ow ommitted) I still get the results

 I know that they are all partial strings of the main string hence 
 NGram works perfect.

 But when I type in Ludlwo (misspelt, characters o and w interchanged) 
 I dont get any results, It should ideally match Ludl and provide the 
 results.

 I am not looking for Edit distance based Spell Correctors. How can I 
 make above NGram based search work?

 Here is my schema.xml (NGramFieldType):

 fieldType name=nGram class=solr.TextField positionIncrementGap=100
 stored=false multiValued=true

 analyzer type=index

 tokenizer class=solr.StandardTokenizerFactory/

 !-- potentially word delimiter, synonym filter, stop words, NOT 
 stemming
 --

 filter class=solr.LowerCaseFilterFactory/

 filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=15 side=front /



 /analyzer

 analyzer type=query

 tokenizer class=solr.StandardTokenizerFactory/

 !-- potentially word delimiter, synonym filter, stop words, NOT 
 stemming
 --

 filter class=solr.LowerCaseFilterFactory/

 /analyzer

 /fieldType


 /PRE
 BR
 **
 BRThis message may contain confidential or 
 proprietary information intended only for the use of 
 theBRaddressee(s) named above or may contain information that is 
 legally privileged. If you areBRnot the intended addressee, or the 
 person responsible for delivering it to the intended addressee,BRyou 
 are hereby notified that reading, disseminating, distributing or 
 copying this message is strictlyBRprohibited. If you have received 
 this message by mistake, please immediately notify us byBRreplying 
 to the message and delete the original message and any copies 
 immediately thereafter.BR BR Thank you.~BR

 **
 BR
 FAFLDBR
 PRE



Re: NGram for misspelt words

2012-07-18 Thread Dikchant Sahi
Have you tried the analysis window to debug.

I believe you are doing something wrong in the fieldType.

On Wed, Jul 18, 2012 at 8:07 PM, Husain, Yavar yhus...@firstam.com wrote:

 Thanks Sahi. I have replaced my EdgeNGramFilterFactory to
 NGramFilterFactory as I need substrings not just in front or back but
 anywhere.
 You are right I put the same NGramFilterFactory in both Query and Index
 however now it does not return any results not even the basic one.

 -Original Message-
 From: Dikchant Sahi [mailto:contacts...@gmail.com]
 Sent: Wednesday, July 18, 2012 7:54 PM
 To: solr-user@lucene.apache.org
 Subject: Re: NGram for misspelt words

 You are creating grams only while indexing and not querying hence 'ludlwo'
 would not match. Your analyzer will create the following grams while
 indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match
 to 'ludlwo'.

 Either you need to create gram while querying also or use Edit Distance.

 On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar yhus...@firstam.com
 wrote:

 
 
 
  I have configured NGram Indexing for some fields.
 
  Say I search for the city Ludlow, I get the results (normal search)
 
  If I search for Ludlo (with w ommitted) I get the results
 
  If I search for Ludl (with ow ommitted) I still get the results
 
  I know that they are all partial strings of the main string hence
  NGram works perfect.
 
  But when I type in Ludlwo (misspelt, characters o and w interchanged)
  I dont get any results, It should ideally match Ludl and provide the
  results.
 
  I am not looking for Edit distance based Spell Correctors. How can I
  make above NGram based search work?
 
  Here is my schema.xml (NGramFieldType):
 
  fieldType name=nGram class=solr.TextField positionIncrementGap=100
  stored=false multiValued=true
 
  analyzer type=index
 
  tokenizer class=solr.StandardTokenizerFactory/
 
  !-- potentially word delimiter, synonym filter, stop words, NOT
  stemming
  --
 
  filter class=solr.LowerCaseFilterFactory/
 
  filter class=solr.EdgeNGramFilterFactory minGramSize=2
  maxGramSize=15 side=front /
 
 
 
  /analyzer
 
  analyzer type=query
 
  tokenizer class=solr.StandardTokenizerFactory/
 
  !-- potentially word delimiter, synonym filter, stop words, NOT
  stemming
  --
 
  filter class=solr.LowerCaseFilterFactory/
 
  /analyzer
 
  /fieldType
 
 
  /PRE
  BR
  **
  BRThis message may contain confidential or
  proprietary information intended only for the use of
  theBRaddressee(s) named above or may contain information that is
  legally privileged. If you areBRnot the intended addressee, or the
  person responsible for delivering it to the intended addressee,BRyou
  are hereby notified that reading, disseminating, distributing or
  copying this message is strictlyBRprohibited. If you have received
  this message by mistake, please immediately notify us byBRreplying
  to the message and delete the original message and any copies
  immediately thereafter.BR BR Thank you.~BR
 
  **
  BR
  FAFLDBR
  PRE
 



Re: edismax not working in a core

2012-07-18 Thread Erick Erickson
the ~2 is the mm parameter I'm pretty sure. So I'd guess your configuration has
a mm parameter set on the core that isn't doing what you want..

Best
Erick

On Tue, Jul 17, 2012 at 3:05 PM, Richard Frovarp rfrov...@apache.org wrote:
 On 07/14/2012 05:32 PM, Erick Erickson wrote:

 Really hard to say. Try executing your query on the cores with
 debugQuery=on and compare the parsed results (for this you
 can probably just ignore the explain bits of the output, concentrate
 on the parsed query).


 Okay, for the example core from the project, the query was:

 test OR samsung

 parsedquery:
 +(DisjunctionMaxQuery((id:test^10.0 | text:test^0.5 | cat:test^1.4 |
 manu:test^1.1 | name:test^1.2 | features:test | sku:test^1.5))
 DisjunctionMaxQuery((id:samsung^10.0 | text:samsung^0.5 | cat:samsung^1.4 |
 manu:samsung^1.1 | name:samsung^1.2 | features:samsung | sku:samsung^1.5)))

 For my core the query was:

 frovarp OR fee

 parsedquery:

 +((DisjunctionMaxQuery((content:fee | title:fee^5.0 | mainContent:fee^2.0))
 DisjunctionMaxQuery((content:frovarp | title:frovarp^5.0 |
 mainContent:frovarp^2.0)))~2)

 What is that ~2? That's the difference. The third core that works properly
 also doesn't have the ~2.


Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-18 Thread Erick Erickson
bq: This index is only used for searching and being replicated every 7 sec from
the master.

This is a red-flag. 7 second replication times are likely forcing your
app to spend
all its time opening new searchers. Your cached filter queries are
likely rarely being re-used
because they're being thrown away every 7 seconds. This assumes you're
changing your master index frequently.

If you need near real time, consider Solr trunk and SolrCloud, but
trying to simulate
NRT with very short replication intervals is usually a bad idea.

A quick test would be to disable replication for a bit (or lengthen it
to, say, 10 minutes)

Best
Erick

On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi f...@efendi.ca wrote:

 FWIW, when asked at what point one would want to split JVMs and shard,
 on the same machine, Grant Ingersoll mentioned 16GB, and precisely for
 GC cost reasons. You're way above that.

 - his index is 75G, and Grant mentioned RAM heap size; we can use terabytes
 of index with 16Gb memory.







Re: Wildcard query vs facet.prefix for autocomplete?

2012-07-18 Thread Erick Erickson
But I did run across an idea a while ago... Either with a custom
update processor
or on the client side, you permute the title so you index something like:
Shadows of the Damned
of the DamnedShadows
the DamnedShadows of
DamnedShadows of the

Index these with KeywordTokenizer and LowercaseFilter.

Now, your responses from TermComponent (prefix) contain the entire
string and you can display them correctly by rearranging the string
at the client side based on the  (or whatever delimiter). Still an issue
with proper capitalization though since TermsComponent only
looks at the actual indexed data and it'll be lower-cased. You could
use String, but then you're counting on the user to capitalize properly, always
a dicey call.

And TermsComponent is very fast

FWIW
Erick

On Wed, Jul 18, 2012 at 9:21 AM, santamaria2 aravinda@contify.com wrote:
 Well silly me... you're right.

 On Wed, Jul 18, 2012 at 6:44 PM, Erick Erickson [via Lucene] 
 ml-node+s472066n399570...@n3.nabble.com wrote:

 Well, option 2 won't do you any good, so speed doesn't really matter.
 Your response would have a facet count for dam, all by itself, something
 like

 int name=damned2/int
 int name=dame1/int

 etc.

 which does not contain anything that lets you reconstruct the title
 for autosuggest.

 Best
 Erick

 On Tue, Jul 17, 2012 at 3:18 AM, santamaria2 [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3995706i=0
 wrote:
  I'll consider using the other methods, but I'd like to know which would
 be
  faster among the two approaches mentioned in my opening post.
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995458.html

  Sent from the Solr - User mailing list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995706.html
  To unsubscribe from Wildcard query vs facet.prefix for autocomplete?, click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995199code=YXJhdmluZGEucmFvQGNvbnRpZnkuY29tfDM5OTUxOTl8MTgyMTM4MDg2OQ==
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995707.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Start solr master and solr slave with enable replication = false

2012-07-18 Thread Erick Erickson
See: 
http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node

I'll admit that I haven't tried this personally, but I think it'll work.

Although I'm pretty sure that if you just disable the master,
disabling the polling on the slave isn't necessary.

Best
Erick

On Wed, Jul 18, 2012 at 6:24 AM, Jamel ESSOUSSI
jamel.essou...@gmail.com wrote:
 Hi,

 It's possible to start the solr master and slave with the following
 configuration

 - replication on master disabled when we start solr -- the replication
 feature must be available
 - polling on slave disabled -- the replication feature must be available


 -- Best Regards
 -- Jamel

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Start-solr-master-and-solr-slave-with-enable-replication-false-tp3995685.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-18 Thread Mou
Hi Eric,

I totally agree. That's what I also figured ultimately. One thing I am not
clear.  The replication is supposed to be incremental ?  But looks like it
is trying to replicate the whole index. May be I am changing the index so
frequently, it is triggering auto merge and a full replication ? I am
thinking in right direction?

I see that when I start the solr search instance before I start feeding the
solr Index, my searches are fine BUT it is using the old searcher so I am
not seeing the updates in the result.

So now I am trying to change my architecture. I am going to have a core
dedicated to receive daily updates, which is going to be 5 million docs and
size is going to be little less than 5 G, which is small and replication
will be faster?

I will search both the cores i.e. old data and the daily updates and do a
field collapsing on my unique id so that I do not return duplicate results
.I haven't tried grouping results ; so not sure about  the performance. Any
suggestion ?

Eventually I have to use Solr trunk like you suggested.

Thank you for your help,

On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] 
ml-node+s472066n3995754...@n3.nabble.com wrote:

 bq: This index is only used for searching and being replicated every 7 sec
 from
 the master.

 This is a red-flag. 7 second replication times are likely forcing your
 app to spend
 all its time opening new searchers. Your cached filter queries are
 likely rarely being re-used
 because they're being thrown away every 7 seconds. This assumes you're
 changing your master index frequently.

 If you need near real time, consider Solr trunk and SolrCloud, but
 trying to simulate
 NRT with very short replication intervals is usually a bad idea.

 A quick test would be to disable replication for a bit (or lengthen it
 to, say, 10 minutes)

 Best
 Erick

 On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3995754i=0
 wrote:

 
  FWIW, when asked at what point one would want to split JVMs and shard,
  on the same machine, Grant Ingersoll mentioned 16GB, and precisely for
  GC cost reasons. You're way above that.
 
  - his index is 75G, and Grant mentioned RAM heap size; we can use
 terabytes
  of index with 16Gb memory.
 
 
 
 
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995754.html
  To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, 
 click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995774.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr faceting -- sort order

2012-07-18 Thread Christopher Gross
I have a keyword field type that I made:

fieldType name=keyword class=solr.TextField
analyzer type=index
tokenizer
class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
charFilter
class=solr.PatternReplaceCharFilterFactory pattern=_ replacement=
 maxBlockChars=5000/
/analyzer
analyzer type=query
tokenizer
class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
charFilter
class=solr.PatternReplaceCharFilterFactory pattern=_ replacement=
 maxBlockChars=5000/
/analyzer
/fieldType

When I do a query, the results that come through retain their original
case for this field, like:
doc 1
keyword: Blah Blah Blah
doc 2
keyword: Yadda Yadda Yadda

But when I pull back facets, i get:

blah blah blah (1)
yadda yadda yadda (1)

I was attempting to fix a sorting problem -- keyword  would show
up after keyword Zulu due to the index sorting, so I thought that
I could lowercase it all to have it be in the same order.  But now it
is all in lower case, and I'd like it to retain the original style.
Is there a different sort that I should use, or is there a change that
I can make to my keyword type that would let the facet count list show
up alphabetically, but ignoring case.

Thanks!

-- Chris


Solr grouping / facet query

2012-07-18 Thread s215903406
Could anyone suggest the options available to handle the following situation:

1. Say we have 1,000 authors

2. 65% of these authors have 10-100 titles they authored; the others have
not authored any titles but provide only their biography and writing
capability. 

3. We want to search for authors, group the results by author, and show the
4 most relevant titles authored for each (if any) next to the author name.

Since not all authors have titles authored, I can't group titles by author.
Also, adding their bio to each title places a lot of duplicate data in the
index. 

So the search results would look like this;

Author A
title0, title6, title8, title3

Author G
no titles found

Author E
title4, title9, title2

Any suggestions would be appreciated!

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: java.lang.AssertionError: System properties invariant violated.

2012-07-18 Thread Chris Hostetter

: I am porting 3x unittests to the solr/lucene trunk. My unittests are
: OK and pass, but in the end fail because the new rule checks for
: modifier properties. I know what the problem is, I am creating new
: system properties in the @beforeClass, but I think I need to do it
: there, because the project loads C library before initializing tests.

The purpose ot the assertion is to verify that no code being tested is 
modifying system properties -- if you are setting hte properties yourself 
in some @BeforeClass methods, just use System.clearProperty to unset them 
in corrisponding @AfterClass methods


-Hoss


How To apply transformation in DIH for multivalued numeric field?

2012-07-18 Thread Pranav Prakash
I have a multivalued integer field and a multivalued string field defined
in my schema as

field name=community_tag_ids
type=integer
indexed=true
stored=true
multiValued=true
omitNorms=true /
field name=community_tags
type=text
indexed=true
termVectors=true
stored=true
multiValued=true
omitNorms=true /


The DIH entity and field defn for the same goes as

entity name=document
  dataSource=app
  onError=skip
  transformer=RegexTransformer
  query=...

 entity name=community_tags
transformer=RegexTransformer
query=SELECT
group_concat(a.id SEPARATOR ',') AS community_tag_ids,
group_concat(a.title SEPARATOR ',') AS community_tags
FROM tags a JOIN tag_dets b ON a.id = b.tag_id
WHERE b.doc_id = ${document.id} 
field column=community_tag_ids name=community_tag_ids/
field column=community_tags splitBy=, /
  /entity

/entity

The value for field community_tags comes correctly as an array of strings.
However the value of field community_tag_ids is not proper

arr name=community_tag_ids
int[B@390c0a18/int
/arr

I tried chaining NumberFormatTransformer with formatStyle=number but that
throws DataImportHandlerException: Failed to apply NumberFormat on column.
Could it be due to NULL values from database or because the value is not
proper? How do we handle NULL in this case?


*Pranav Prakash*

temet nosce


Re: edismax not working in a core

2012-07-18 Thread Richard Frovarp

On 07/18/2012 11:20 AM, Erick Erickson wrote:

the ~2 is the mm parameter I'm pretty sure. So I'd guess your configuration has
a mm parameter set on the core that isn't doing what you want..



I'm not setting the mm parameter or the q.op parameter. All three cores 
have a defaultOperator of OR. So I don't know where that would be coming 
from. However, if I specify a mm of 0, it appears to work just fine. 
I've added it as a default parameter to the select handler.


Thanks for pointing me in the right direction.

Richard


Re: DIH XML configs for multi environment

2012-07-18 Thread Pranav Prakash
That approach would work for core dependent parameters. In my case, the
params are environment dependent. I think a simpler approach would be to
pass the url param as JVM options, and these XMLs get it from there.

I haven't tried it yet.

*Pranav Prakash*

temet nosce



On Tue, Jul 17, 2012 at 5:09 PM, Markus Klose m...@shi-gmbh.com wrote:

 Hi

 There is one more approach using the property mechanism.

 You could specify the datasource like this:
 dataSource name=database driver=${sqlDriver} url=${sqlURL}/

  And you can specifiy the properties in the solr.xml in your core
 configuration like this:

 core instanceDir=core1 name=core1
 property name=sqlURL value=jdbc:hsqldb:/temp/example/ex/
 
 /core


 Viele Grüße aus Augsburg

 Markus Klose
 SHI Elektronische Medien GmbH


 Adresse: Curt-Frenzel-Str. 12, 86167 Augsburg

 Tel.:   0821 7482633 26
 Tel.:   0821 7482633 0 (Zentrale)
 Mobil:0176 56516869
 Fax:   0821 7482633 29

 E-Mail: markus.kl...@shi-gmbh.com
 Internet: http://www.shi-gmbh.com

 Registergericht Augsburg HRB 17382
 Geschäftsführer: Peter Spiske
 USt.-ID: DE 182167335





 -Ursprüngliche Nachricht-
 Von: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com]
 Gesendet: Mittwoch, 11. Juli 2012 11:21
 An: solr-user@lucene.apache.org
 Betreff: Re: DIH XML configs for multi environment

 http://wiki.eclipse.org/Jetty/Howto/Configure_JNDI_Datasource
 http://docs.codehaus.org/display/JETTY/DataSource+Examples


 On Wed, Jul 11, 2012 at 2:30 PM, Pranav Prakash pra...@gmail.com wrote:

  That's cool. Is there something similar for Jetty as well? We use Jetty!
 
  *Pranav Prakash*
 
  temet nosce
 
 
 
  On Wed, Jul 11, 2012 at 1:49 PM, Rahul Warawdekar 
  rahul.warawde...@gmail.com wrote:
 
   Hi Pranav,
  
   If you are using Tomcat to host Solr, you can define your data
   source in context.xml file under tomcat configuration.
   You have to refer to this datasource with the same name in all the 3
   environments from DIH data-config.xml.
   This context.xml file will vary across 3 environments having
   different credentials for dev, stag and prod.
  
   eg
   DIH data-config.xml will refer to the datasource as listed below
   dataSource jndiName=java:comp/env/*YOUR_DATASOURCE_NAME*
   type=JdbcDataSource readOnly=true /
  
   context.xml file which is located under /TOMCAT_HOME/conf folder
   will have the resource entry as follows
 Resource name=*YOUR_DATASOURCE_NAME* auth=Container
   type= username=X password=X
   driverClassName=
   url=
   maxActive=8
   /
  
   On Wed, Jul 11, 2012 at 1:31 PM, Pranav Prakash pra...@gmail.com
  wrote:
  
The DIH XML config file has to be specified dataSource. In my
case, and possibly with many others, the logon credentials as well
as mysql
  server
paths would differ based on environments (dev, stag, prod). I
don't
  want
   to
end up coming with three different DIH config files, three
different handlers and so on.
   
What is a good way to deal with this?
   
   
*Pranav Prakash*
   
temet nosce
   
  
  
  
   --
   Thanks and Regards
   Rahul A. Warawdekar
  
 



 --
 Thanks and Regards
 Rahul A. Warawdekar



Re: Searcher Refrence Counts

2012-07-18 Thread Mark Miller
I'd guess the getSearcher call you are making is incrementing the ref count and 
you are not decrementing it?

On Jul 18, 2012, at 12:17 PM, Karthick Duraisamy Soundararaj wrote:

 Hi All, 
The SolrCore seems to have a reference counted searcher with it. I 
 had to write a customSearchHandler that extends SearchHandler, and I was 
 playing around with it. I did the following change to search handler
 
 SearchHanlder.java
 --
 handleRequestBody(SolrQueryRequest req,SolrQueryResponse req)
  {
System.out.println(Reference count Before Search: 
 +req.getCore().getSearcher.getRefcount)  //In eclipse ..
..
 ...
 System.out.println(Reference count After Search : 
 +req.getCore().getSearcher.getRefcount) // In eclipse
 }
 
 
 Now, I am surprised to see Reference count not getting decremented at all. 
 Following is the sample output I get
 
Reference count before search:1
Reference count after search:2
..
Reference count before search:2
Reference count after search:3
  .
Reference count before search:4
Reference count after search:5
...

Reference count before search:3000
Reference count after search:30001
 
 
 The reference count seems to be increasing. Wouldnt this cause a memory leak?

 
 
 
 
 

- Mark Miller
lucidimagination.com













RE: How To apply transformation in DIH for multivalued numeric field?

2012-07-18 Thread Dyer, James
Don't you want to specify splitBy for the integer field too?

Actually though, you shouldn't need to use GROUP_CONCAT and RegexTransformer at 
all.  DIH is designed to handle 1many relations between parent and child 
entities by populating all the child fields as multi-valued automatically.  I 
guess your approach leads to a lot fewer rows getting sent from your db to Solr 
though.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Pranav Prakash [mailto:pra...@gmail.com] 
Sent: Wednesday, July 18, 2012 2:38 PM
To: solr-user@lucene.apache.org
Subject: How To apply transformation in DIH for multivalued numeric field?

I have a multivalued integer field and a multivalued string field defined
in my schema as

field name=community_tag_ids
type=integer
indexed=true
stored=true
multiValued=true
omitNorms=true /
field name=community_tags
type=text
indexed=true
termVectors=true
stored=true
multiValued=true
omitNorms=true /


The DIH entity and field defn for the same goes as

entity name=document
  dataSource=app
  onError=skip
  transformer=RegexTransformer
  query=...

 entity name=community_tags
transformer=RegexTransformer
query=SELECT
group_concat(a.id SEPARATOR ',') AS community_tag_ids,
group_concat(a.title SEPARATOR ',') AS community_tags
FROM tags a JOIN tag_dets b ON a.id = b.tag_id
WHERE b.doc_id = ${document.id} 
field column=community_tag_ids name=community_tag_ids/
field column=community_tags splitBy=, /
  /entity

/entity

The value for field community_tags comes correctly as an array of strings.
However the value of field community_tag_ids is not proper

arr name=community_tag_ids
int[B@390c0a18/int
/arr

I tried chaining NumberFormatTransformer with formatStyle=number but that
throws DataImportHandlerException: Failed to apply NumberFormat on column.
Could it be due to NULL values from database or because the value is not
proper? How do we handle NULL in this case?


*Pranav Prakash*

temet nosce



Re: java.lang.AssertionError: System properties invariant violated.

2012-07-18 Thread Roman Chyla
Thank you! I haven't really understood the LuceneTestCase.classRules
before this.

roman

On Wed, Jul 18, 2012 at 3:11 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : I am porting 3x unittests to the solr/lucene trunk. My unittests are
 : OK and pass, but in the end fail because the new rule checks for
 : modifier properties. I know what the problem is, I am creating new
 : system properties in the @beforeClass, but I think I need to do it
 : there, because the project loads C library before initializing tests.

 The purpose ot the assertion is to verify that no code being tested is
 modifying system properties -- if you are setting hte properties yourself
 in some @BeforeClass methods, just use System.clearProperty to unset them
 in corrisponding @AfterClass methods


 -Hoss


Re: edismax not working in a core

2012-07-18 Thread Richard Frovarp

On 07/18/2012 02:39 PM, Richard Frovarp wrote:

On 07/18/2012 11:20 AM, Erick Erickson wrote:

the ~2 is the mm parameter I'm pretty sure. So I'd guess your
configuration has
a mm parameter set on the core that isn't doing what you want..



I'm not setting the mm parameter or the q.op parameter. All three cores
have a defaultOperator of OR. So I don't know where that would be coming
from. However, if I specify a mm of 0, it appears to work just fine.
I've added it as a default parameter to the select handler.

Thanks for pointing me in the right direction.

Richard


Okay, that's wrong. Term boosting isn't working either, and what I did 
above just turns everything into an OR query.


I did figure out the problem, however. In the core that wasn't working, 
one of the query field names wasn't correct. No errors were ever thrown, 
it just made the query behave in a very odd way.


I finally figured it out after debugging each field independent of each 
other.


Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Briggs Thompson
I have realized this is not specific to SolrJ but to my instance of Solr.
Using curl to delete by query is not working either.

Running
curl http://localhost:8983/solr/coupon/update -H Content-Type: text/xml
--data-binary 'deletequery*:*/query/delete'

Yields this in the logs:
INFO: [coupon] webapp=/solr path=/update
params={stream.body=deletequery*:*/query/delete}
{deleteByQuery=*:*} 0 0

But the corpus of documents in the core do not change.

My solrconfig is pretty barebones at this point, but I attached it in case
anyone sees something strange. Anyone have any idea why documents aren't
getting deleted?

Thanks in advance,
Briggs Thompson

On Wed, Jul 18, 2012 at 12:54 PM, Briggs Thompson 
w.briggs.thomp...@gmail.com wrote:

 Hello All,

 I am using 4.0 Alpha and running into an issue with indexing using
 HttpSolrServer (SolrJ).

 Relevant java code:
 HttpSolrServer solrServer = new HttpSolrServer(MY_SERVER);
 solrServer.setRequestWriter(new BinaryRequestWriter());

 Relevant Solrconfig.xml content:

   requestHandler name=/update class=solr.UpdateRequestHandler  /

   requestHandler name=/update/javabin
 class=solr.BinaryUpdateRequestHandler /

 Indexing documents works perfectly fine (using addBeans()), however, when
 trying to do deletes I am seeing issues. I tried to do
 a solrServer.deleteByQuery(*:*) followed by a commit and optimize, and
 nothing is deleted.

 The response from delete request is a success, and even in the solr logs
 I see the following:

 INFO: [coupon] webapp=/solr path=/update/javabin
 params={wt=javabinversion=2} {deleteByQuery=*:*} 0 1
 Jul 18, 2012 11:15:34 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start
 commit{flags=0,version=0,optimize=true,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false}



 I tried removing the binaryRequestWriter and have the request send out in
 default format, and I get the following error.

 SEVERE: org.apache.solr.common.SolrException: Unsupported ContentType:
 application/octet-stream  Not in: [application/xml, text/csv, text/json,
 application/csv, application/javabin, text/xml, application/json]

 at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561)
  at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
  at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
  at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
  at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
  at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
  at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:636)


 I thought that an optimize does the same thing as expungeDeletes, but in
 the log I see expungeDeletes=false. Is there a way to force that using
 SolrJ?

 Thanks in advance,
 Briggs


?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

!--
 This is a stripped down 

Custom JUnit tests based on SolrTestCaseJ4 fails intermittently.

2012-07-18 Thread Koorosh Vakhshoori
Hi,
  I am trying out the Solr Alpha release against some custom and Junit codes
I have written. I am seeing my custom JUnit tests failing once in a while.
The tests are based on Solr Junit test code where they are extending
SolrTestCaseJ4. My guess is the Randomized Testing coming across some issue
here. However not sure what the source of the problem is. I noticed the
value of 'codec' is null for failed cases, but I am setting the
luceneMatchVersion value in solrconfig.xml as bellow:
  
   
luceneMatchVersion${tests.luceneMatchVersion:LUCENE_CURRENT}/luceneMatchVersion
 
  I am including the test outputs for both scenarios here.
  
  Any help or pointer appreciated.
  
  Thanks,
  
  Koorosh
  

Here is the output of Junit test which failes when running it from Eclipse:
  
NOTE: test params are: codec=null, sim=null, locale=null, timezone=(null)
NOTE: Windows 7 6.1 amd64/Sun Microsystems Inc. 1.6.0_21
(64-bit)/cpus=4,threads=1,free=59414480,total=63242240
NOTE: All tests run in this JVM: [TestDocsHandler]
Jul 18, 2012 3:55:25 PM com.carrotsearch.randomizedtesting.RandomizedRunner
runSuite
SEVERE: Panic: RunListener hook shouldn't throw exceptions.
java.lang.NullPointerException
at
org.apache.lucene.util.RunListenerPrintReproduceInfo.reportAdditionalFailureInfo(RunListenerPrintReproduceInfo.java:159)
at
org.apache.lucene.util.RunListenerPrintReproduceInfo.testRunFinished(RunListenerPrintReproduceInfo.java:104)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:634)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)

Here is the output for the same test where it is successful:

24 T11 oas.SolrTestCaseJ4.initCore initCore
Creating dataDir:
C:\Users\xuser\AppData\Local\Temp\solrtest-TestDocsHandler-1342651924084
43 T11 oasc.SolrResourceLoader.locateSolrHome JNDI not configured for solr
(NoInitialContextEx)
43 T11 oasc.SolrResourceLoader.locateSolrHome using system property
solr.solr.home: solr-gold/solr-extraction
45 T11 oasc.SolrResourceLoader.init new SolrResourceLoader for deduced
Solr Home: 'solr-gold/solr-extraction\'
284 T11 oasc.SolrConfig.init Using Lucene MatchVersion: LUCENE_40
429 T11 oasc.SolrConfig.init Loaded SolrConfig: solrconfig-dow.xml
434 T11 oass.IndexSchema.readSchema Reading Solr Schema
443 T11 oass.IndexSchema.readSchema Schema name=SolvNet Common core
522 T11 oass.IndexSchema.readSchema default search field in schema is
indexed_content
524 T11 oass.IndexSchema.readSchema query parser default operator is AND
525 T11 oass.IndexSchema.readSchema unique key field: id
616 T11 oasc.SolrResourceLoader.locateSolrHome JNDI not configured for solr
(NoInitialContextEx)
617 T11 oasc.SolrResourceLoader.locateSolrHome using system property
solr.solr.home: solr-gold/solr-extraction
617 T11 oasc.SolrResourceLoader.init new SolrResourceLoader for directory:
'solr-gold/solr-extraction\'
618 T11 oasc.CoreContainer.init New CoreContainer 994682772
642 T11 oasc.SolrCore.init [collection1] Opening new SolrCore at
solr-gold/solr-extraction\,
dataDir=C:\Users\koo\AppData\Local\Temp\solrtest-TestDocsHandler-1342651924084\
642 T11 oasc.SolrCore.init JMX monitoring not detected for core:
collection1
648 T11 oasc.SolrCore.getNewIndexDir WARNING New index directory detected:
old=null
new=C:\Users\koo\AppData\Local\Temp\solrtest-TestDocsHandler-1342651924084\index/
648 T11 oasc.SolrCore.initIndex WARNING [collection1] Solr index directory
'C:\Users\koo\AppData\Local\Temp\solrtest-TestDocsHandler-1342651924084\index'
doesn't exist. Creating new index...
742 T11 oasc.SolrDeletionPolicy.onCommit SolrDeletionPolicy.onCommit:
commits:num=1

commit{dir=MockDirWrapper(org.apache.lucene.store.RAMDirectory@44023756
lockFactory=org.apache.lucene.store.NativeFSLockFactory@21ed5459),segFN=segments_1,generation=1,filenames=[segments_1]
743 T11 oasc.SolrDeletionPolicy.updateCommits newest commit = 1
871 T11 oasc.RequestHandlers.initHandlersFromConfig created /update/javabin:
solr.BinaryUpdateRequestHandler
875 T11 oasc.RequestHandlers.initHandlersFromConfig created standard:
solr.StandardRequestHandler
878 T11 oasc.RequestHandlers.initHandlersFromConfig created /update:
solr.XmlUpdateRequestHandler
878 T11 oasc.RequestHandlers.initHandlersFromConfig created /admin/:
org.apache.solr.handler.admin.AdminHandlers
886 T11 oasc.RequestHandlers.initHandlersFromConfig created /update/extract:
com.synopsys.ies.solr.backend.handler.extraction.SolvNetExtractingRequestHandler
891 T11 oasc.RequestHandlers.initHandlersFromConfig WARNING Multiple
requestHandler registered to the same name: standard ignoring:
org.apache.solr.handler.StandardRequestHandler
892 T11 oasc.RequestHandlers.initHandlersFromConfig created standard:
solr.SearchHandler
892 T11 oasc.RequestHandlers.initHandlersFromConfig created employee:

Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-18 Thread Erick Erickson
Replication will indeed be incremental. But if you commit too often (and
committing too often a common mistake) then the merging will
eventually merge everything into new segments and the whole thing will
be replicated.

Additionally, optimizing (or forceMerge in 4.x) will make a single segment
and force the entire index to replicate.

You should emphatically _not_ have to have two cores. Solr is built to
handle replication etc. I suspect your committing too often or some
other mis-configuration and you're creating a problem for yourself.

Here's what I'd do:
1 increase the polling interval to, say, 10 minutes (or however long you can
live with stale data) on the slave.

2 decrease the commits you're  doing. This could involve the autocommit options
you might have set in solrconfig.xml. It could be your client (don't
know how you're
indexing, solrJ?) and the commitWithin parameter. Could be you're
optimizing (if you
are, stop it!).

Note that ramBufferSizeMB has no influence on how often things are _committed_.
When this limit is exceeded, the accumulated indexing data is written
to the currently-open
segment. Multiple flushes can go to the _same_ segment. The write-once nature of
segments means that after a segment is closed (through a commit), it
is not changed. But
a segment that is not closed may be written to multiple times until it's closed.

HTH
Erick

On Wed, Jul 18, 2012 at 1:25 PM, Mou mouna...@gmail.com wrote:
 Hi Eric,

 I totally agree. That's what I also figured ultimately. One thing I am not
 clear.  The replication is supposed to be incremental ?  But looks like it
 is trying to replicate the whole index. May be I am changing the index so
 frequently, it is triggering auto merge and a full replication ? I am
 thinking in right direction?

 I see that when I start the solr search instance before I start feeding the
 solr Index, my searches are fine BUT it is using the old searcher so I am
 not seeing the updates in the result.

 So now I am trying to change my architecture. I am going to have a core
 dedicated to receive daily updates, which is going to be 5 million docs and
 size is going to be little less than 5 G, which is small and replication
 will be faster?

 I will search both the cores i.e. old data and the daily updates and do a
 field collapsing on my unique id so that I do not return duplicate results
 .I haven't tried grouping results ; so not sure about  the performance. Any
 suggestion ?

 Eventually I have to use Solr trunk like you suggested.

 Thank you for your help,

 On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] 
 ml-node+s472066n3995754...@n3.nabble.com wrote:

 bq: This index is only used for searching and being replicated every 7 sec
 from
 the master.

 This is a red-flag. 7 second replication times are likely forcing your
 app to spend
 all its time opening new searchers. Your cached filter queries are
 likely rarely being re-used
 because they're being thrown away every 7 seconds. This assumes you're
 changing your master index frequently.

 If you need near real time, consider Solr trunk and SolrCloud, but
 trying to simulate
 NRT with very short replication intervals is usually a bad idea.

 A quick test would be to disable replication for a bit (or lengthen it
 to, say, 10 minutes)

 Best
 Erick

 On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3995754i=0
 wrote:

 
  FWIW, when asked at what point one would want to split JVMs and shard,
  on the same machine, Grant Ingersoll mentioned 16GB, and precisely for
  GC cost reasons. You're way above that.
 
  - his index is 75G, and Grant mentioned RAM heap size; we can use
 terabytes
  of index with 16Gb memory.
 
 
 
 
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995754.html
  To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, 
 click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995774.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Brendan Grainger
Hi Briggs,

I'm not sure about Solr 4.0, but do you need to commit?

 curl http://localhost:8983/solr/coupon/update?commit=true -H Content-Type: 
 text/xml --data-binary 'deletequery*:*/query/delete'


Brendan


www.kuripai.com

On Jul 18, 2012, at 7:11 PM, Briggs Thompson wrote:

 I have realized this is not specific to SolrJ but to my instance of Solr. 
 Using curl to delete by query is not working either. 
 
 Running 
 curl http://localhost:8983/solr/coupon/update -H Content-Type: text/xml 
 --data-binary 'deletequery*:*/query/delete'
 
 Yields this in the logs:
 INFO: [coupon] webapp=/solr path=/update 
 params={stream.body=deletequery*:*/query/delete} {deleteByQuery=*:*} 
 0 0
 
 But the corpus of documents in the core do not change. 
 
 My solrconfig is pretty barebones at this point, but I attached it in case 
 anyone sees something strange. Anyone have any idea why documents aren't 
 getting deleted?
 
 Thanks in advance,
 Briggs Thompson
 
 On Wed, Jul 18, 2012 at 12:54 PM, Briggs Thompson 
 w.briggs.thomp...@gmail.com wrote:
 Hello All,
 
 I am using 4.0 Alpha and running into an issue with indexing using 
 HttpSolrServer (SolrJ). 
 
 Relevant java code:
 HttpSolrServer solrServer = new HttpSolrServer(MY_SERVER);
 solrServer.setRequestWriter(new BinaryRequestWriter());
 
 Relevant Solrconfig.xml content:
   requestHandler name=/update class=solr.UpdateRequestHandler  /
   requestHandler name=/update/javabin 
 class=solr.BinaryUpdateRequestHandler /
 
 Indexing documents works perfectly fine (using addBeans()), however, when 
 trying to do deletes I am seeing issues. I tried to do a 
 solrServer.deleteByQuery(*:*) followed by a commit and optimize, and 
 nothing is deleted. 
 
 The response from delete request is a success, and even in the solr logs I 
 see the following:
 INFO: [coupon] webapp=/solr path=/update/javabin 
 params={wt=javabinversion=2} {deleteByQuery=*:*} 0 1
 Jul 18, 2012 11:15:34 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start 
 commit{flags=0,version=0,optimize=true,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false}
 
 
 I tried removing the binaryRequestWriter and have the request send out in 
 default format, and I get the following error. 
 SEVERE: org.apache.solr.common.SolrException: Unsupported ContentType: 
 application/octet-stream  Not in: [application/xml, text/csv, text/json, 
 application/csv, application/javabin, text/xml, application/json]
   at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
   at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
   at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
   at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
   at 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:636)
 
 
 I thought that an optimize does the same thing as expungeDeletes, but in the 
 log I see expungeDeletes=false. Is there a way to force that using SolrJ?
 
 Thanks in advance,
 Briggs
 
 
 solrconfig.xml



SOLR 4 ALPHA /terms /browse

2012-07-18 Thread Nick Koton
When I setup a 2 shard cluster using the example and run it through its
paces, I find two features that do not work as I expect.  Any suggestions on
adjusting my configuration or expectations would be appreciated.

/terms does not return any terms when issued as follows:
http://hostname:8983/solr/terms?terms.fl=nameterms=trueterms.limit=-1isSh
ard=trueterms.sort=indexterms.prefix=s
but does return reasonable results when distrib is turned off like so
http://hostname:8983/solr/terms?terms.fl=nameterms=truedistrib=falseterms
.limit=-1isShard=trueterms.sort=indexterms.prefix=s

/browse returns this stack trace to the browser
HTTP ERROR 500

Problem accessing /solr/browse. Reason:

{msg=ZkSolrResourceLoader does not support getConfigDir() - likely, what
you are trying to do is not supported in ZooKeeper
mode,trace=org.apache.solr.common.cloud.ZooKeeperException:
ZkSolrResourceLoader does not support getConfigDir() - likely, what you are
trying to do is not supported in ZooKeeper mode
at
org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader
.java:99)
at
org.apache.solr.response.VelocityResponseWriter.getEngine(VelocityResponseWr
iter.java:117)
at
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter
.java:40)
at
org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.write(SolrCore.
java:1990)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.
java:398)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
276)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler
.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119
)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java
:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java
:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:
192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:
999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117
)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.
java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1
11)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpCo
nnection.java:454)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpCo
nnection.java:47)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpC
onnection.java:890)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplet
e(AbstractHttpConnection.java:944)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnectio
n.java:66)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketCon
nector.java:254)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:
599)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:5
34)
at java.lang.Thread.run(Thread.java:662)
,code=500}

Best regards,
Nick Koton





Solr multiple cores activation

2012-07-18 Thread Praful Bagai
I am implementing a search engine with Nutch as web crawler and Solr for
searching. Now,since Nutch has no search-user-interface any more, I came to
know about Ajax-Solr as search-user-interface.

I implemented Ajax-Solr with no hindrance, but during its search operation
its only search under reuters data. If I want to crawl the complete web
,other than reuter's data, using nutch and integrate it with solr,then i
have to replace solr's schema.xml file with nutch's schema.xml file which
will not be according to ajax-solr configuration. By replacing the
schema.xml files, ajax-solr *wont* work!!!

So, I found a solution to this (correct me if i am wrong),ie, to activate
multiple cores which means integrating Solr with nutch in one core(ie
indexing) and using Ajax-Solr in other.

I tried activating multiple cores,ie integrating solr with nutch in one
core and ajax-solr in other, but to *NO luck*. I tried every single thing,
every permutation and combination , but failed to set them up.
I followed these links
1) http://wiki.apache.org/solr/CoreAdmin
2)
http://www.plaidpony.com/blog/post/2011/04/Multicore-SOLR-And-Tomcat-On-Windows-Server-2008-R2.aspx


But they also didnt helped either. Can you please tell how to set them up???
Been stuck up with this for over 2 days nows. Kindly help!!!

Are there any other search-user-interface??


Thanks
Regards

Praful Bagai


RE: Could I use Solr to index multiple applications?

2012-07-18 Thread Zhang, Lisheng
Yury and Shashi,

Thanks very much for helps! I am studying the options pointed
out by you (Solr multiple cores and Elasticsearch).

Best regards, Lisheng

-Original Message-
From: Yury Kats [mailto:yuryk...@yahoo.com]
Sent: Tuesday, July 17, 2012 7:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Could I use Solr to index multiple applications?


On 7/17/2012 9:26 PM, Zhang, Lisheng wrote:
 Thanks very much for quick help! Multicore sounds interesting,
 I roughly read the doc, so we need to put each core name into
 Solr config XML, if we add another core and change XML, do we
 need to restart Solr?

You can add/create cores on the fly, without restarting.
See http://wiki.apache.org/solr/CoreAdmin#CREATE


Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-18 Thread Mou
Increasing the polling interval does help. But the requirement is to get a
document indexed and searchable instantly ( sounds like RTS), 30 sec is
acceptable.I need to look at Solr NRT and cloud.

I created a new core to accept daily updates and replicate every 10 sec.
Two other cores with 234 Million documents are configured to replicate only
once a day.
I am feeding all three cores but two big cores are not replicating. While
searching I am running a group.field on my unique id and taking the most
updated one. Right now it looks fine.Every day I am going to delete the
last day's records from the daily update.

I am planning to use rsync for replication, it will be fusion IO to fusion
IO , so hopefully will be very fast. What do you think ?

We use windows service ( written in dot net C#) to feed the data using REST
call. That is really fast , we can feed more than 15 Million data in a day
to two cores easily. I am using solr config autocommit = 5 sec

I could not figure out how I was able to achieve those numbers in my test
environment, all configuration were same except I had lot less memory in
test  ! I am trying to find out what I am missing in other configuration.
My SLES kernel version is different in production, its a 3.0.*  , test was
2.6.* but I do not think that can cause a problem.

Thank you again,
Mou

On Wed, Jul 18, 2012 at 6:26 PM, Erick Erickson [via Lucene] 
ml-node+s472066n3995861...@n3.nabble.com wrote:

 Replication will indeed be incremental. But if you commit too often (and
 committing too often a common mistake) then the merging will
 eventually merge everything into new segments and the whole thing will
 be replicated.

 Additionally, optimizing (or forceMerge in 4.x) will make a single segment
 and force the entire index to replicate.

 You should emphatically _not_ have to have two cores. Solr is built to
 handle replication etc. I suspect your committing too often or some
 other mis-configuration and you're creating a problem for yourself.

 Here's what I'd do:
 1 increase the polling interval to, say, 10 minutes (or however long you
 can
 live with stale data) on the slave.

 2 decrease the commits you're  doing. This could involve the autocommit
 options
 you might have set in solrconfig.xml. It could be your client (don't
 know how you're
 indexing, solrJ?) and the commitWithin parameter. Could be you're
 optimizing (if you
 are, stop it!).

 Note that ramBufferSizeMB has no influence on how often things are
 _committed_.
 When this limit is exceeded, the accumulated indexing data is written
 to the currently-open
 segment. Multiple flushes can go to the _same_ segment. The write-once
 nature of
 segments means that after a segment is closed (through a commit), it
 is not changed. But
 a segment that is not closed may be written to multiple times until it's
 closed.

 HTH
 Erick

 On Wed, Jul 18, 2012 at 1:25 PM, Mou [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3995861i=0
 wrote:

  Hi Eric,
 
  I totally agree. That's what I also figured ultimately. One thing I am
 not
  clear.  The replication is supposed to be incremental ?  But looks like
 it
  is trying to replicate the whole index. May be I am changing the index
 so
  frequently, it is triggering auto merge and a full replication ? I am
  thinking in right direction?
 
  I see that when I start the solr search instance before I start feeding
 the
  solr Index, my searches are fine BUT it is using the old searcher so I
 am
  not seeing the updates in the result.
 
  So now I am trying to change my architecture. I am going to have a core
  dedicated to receive daily updates, which is going to be 5 million docs
 and
  size is going to be little less than 5 G, which is small and replication
  will be faster?
 
  I will search both the cores i.e. old data and the daily updates and do
 a
  field collapsing on my unique id so that I do not return duplicate
 results
  .I haven't tried grouping results ; so not sure about  the performance.
 Any
  suggestion ?
 
  Eventually I have to use Solr trunk like you suggested.
 
  Thank you for your help,
 
  On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=3995861i=1
 wrote:
 
  bq: This index is only used for searching and being replicated every 7
 sec
  from
  the master.
 
  This is a red-flag. 7 second replication times are likely forcing your
  app to spend
  all its time opening new searchers. Your cached filter queries are
  likely rarely being re-used
  because they're being thrown away every 7 seconds. This assumes you're
  changing your master index frequently.
 
  If you need near real time, consider Solr trunk and SolrCloud, but
  trying to simulate
  NRT with very short replication intervals is usually a bad idea.
 
  A quick test would be to disable replication for a bit (or lengthen it
  to, say, 10 minutes)
 
  Best
  Erick
 
  On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi [hidden email]
 

Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Yury Kats
On 7/18/2012 7:11 PM, Briggs Thompson wrote:
 I have realized this is not specific to SolrJ but to my instance of Solr. 
 Using curl to delete by query is not working either. 

Can be this: https://issues.apache.org/jira/browse/SOLR-3432


Re: Quick Confirmation on LocalSolrQueryRequest close

2012-07-18 Thread Karthick Duraisamy Soundararaj
Put my question wrong.. Excuse me for spamming.. its been a tiring couple
of days and I am almost sleep typing..  Please read the snippet again.

This might be a dumb question. But I would like to confirm.

 Will the following snippet cause a index searcher leak and end up in an
 out of memory exception when newsearchers are created?

 class myCustomHandler extends SearchHandler {
  .
   void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) {

   LocalSolrQueryRequest newReq = new LocalSolrQueryRequest();
   newReq = req.getCore();
   .
   //  newReq.close()Will removing this lead to OOME?
   }

My conviction is yes. But just want to confirm..






On Wed, Jul 18, 2012 at 11:04 PM, Karthick Duraisamy Soundararaj 
karthick.soundara...@gmail.com wrote:

 This might be a dumb question. But I would like to confirm.

 Will the following snippet cause a index searcher leak and end up in an
 out of memory exception when newsearchers are created?

 class myCustomHandler extends SearchHandler {
  .
   void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) {

   LocalSolrQueryRequest newReq = new LocalSolrQueryRequest();
   newReq = req.getCore();
   .
   newReq.close()
   }

 My conviction is yes. But just want to confirm..



Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Briggs Thompson
Yury,

Thank you so much! That was it. Man, I spent a good long while trouble
shooting this. Probably would have spent quite a bit more time. I
appreciate your help!!

-Briggs

On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com wrote:

 On 7/18/2012 7:11 PM, Briggs Thompson wrote:
  I have realized this is not specific to SolrJ but to my instance of
 Solr. Using curl to delete by query is not working either.

 Can be this: https://issues.apache.org/jira/browse/SOLR-3432



Frustrating differences in fieldNorm between two different versions of solr indexing the same document

2012-07-18 Thread Aaron Daubman
Greetings,

I've been digging in to this for two days now and have come up short -
hopefully there is some simple answer I am just not seeing:

I have a solr 1.4.1 instance and a solr 3.6.0 instance, both configured as
identically as possible (given deprecations) and indexing the same document.

For most queries the results are very close (scoring within three
significant differences, almost identical positions in results).

However, for certain documents, the scores are very different (causing
these docs to be ranked +/- 25 positions different or more in the results)

In looking at debugQuery output, it seems like this is due to fieldNorm
values being lower for the 3.6.0 instance than the 1.4.1.

(note that for most docs, the fieldNorms are identical)

I have taken the field values for the example below and run them
through /admin/analysis.jsp on each solr instance. Even for the problematic
docs/fields, the results are almost identical. For the example below, the
t_tag values for the problematic doc:
1.4.1: 162 values
3.6.0: 164 values

note that 1/sqrt(162) = 0.07857 ~= fieldNorm for 1.4.1,
however, (1/0.0625)^2 = 256, which is no where near 164

Here is a particular example from 1.4.1:
1.6263733 = (MATCH) fieldWeight(t_tag:soul in 2066419), product of:
   3.8729835 = tf(termFreq(t_tag:soul)=15)
   5.3750753 = idf(docFreq=27619, maxDocs=2194294)
   0.078125 = fieldNorm(field=t_tag, doc=2066419)

And the same from 3.6.0:
1.3042576 = (MATCH) fieldWeight(t_tag:soul in 1977957), product of:
   3.8729835 = tf(termFreq(t_tag:soul)=15)
   5.388126 = idf(docFreq=27740, maxDocs=2232857)
   0.0625 = fieldNorm(field=t_tag, doc=1977957)


Here is the 1.4.1 config for the t_tag field and text type:
fieldtype name=text class=solr.TextField
positionIncrementGap=100
  analyzer
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StandardFilterFactory/
  filter class=solr.ISOLatin1AccentFilterFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory words=stopwords.txt
ignoreCase=true/
  filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
  /analyzer
  /fieldtype
dynamicField name=t_* type=text indexed=true stored=true
required=false multiValued=true termVectors=true/


And 3.6.0 schema config for the t_tag field and text type:
fieldtype name=text class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory
words=stopwords.txt ignoreCase=true/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldtype
field name=t_tag type=text indexed=true stored=true
required=false multiValued=true/

I at first got distracted by this change between versions:
LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default. This
means that terms with a position increment gap of zero do not affect the
norms calculation by default.
However, this doesn't appear to be causing the issue as, according to
analysis.jsp there is no overlap for t_tag...

Can you point me to where these fieldNorm differences are coming from and
why they'd only be happing for a select few documents for which the content
doesn't stand out?

Thank you,
 Aaron


Re: How To apply transformation in DIH for multivalued numeric field?

2012-07-18 Thread Pranav Prakash
I had tried with splitBy for numeric field, but that also did not worked
for me. However I got rid of group_concat and it was all good to go.

Thanks a lot!! I really had a difficult time understanding this behavior.


*Pranav Prakash*

temet nosce



On Thu, Jul 19, 2012 at 1:34 AM, Dyer, James james.d...@ingrambook.comwrote:

 Don't you want to specify splitBy for the integer field too?

 Actually though, you shouldn't need to use GROUP_CONCAT and
 RegexTransformer at all.  DIH is designed to handle 1many relations
 between parent and child entities by populating all the child fields as
 multi-valued automatically.  I guess your approach leads to a lot fewer
 rows getting sent from your db to Solr though.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Pranav Prakash [mailto:pra...@gmail.com]
 Sent: Wednesday, July 18, 2012 2:38 PM
 To: solr-user@lucene.apache.org
 Subject: How To apply transformation in DIH for multivalued numeric field?

 I have a multivalued integer field and a multivalued string field defined
 in my schema as

 field name=community_tag_ids
 type=integer
 indexed=true
 stored=true
 multiValued=true
 omitNorms=true /
 field name=community_tags
 type=text
 indexed=true
 termVectors=true
 stored=true
 multiValued=true
 omitNorms=true /


 The DIH entity and field defn for the same goes as

 entity name=document
   dataSource=app
   onError=skip
   transformer=RegexTransformer
   query=...

  entity name=community_tags
 transformer=RegexTransformer
 query=SELECT
 group_concat(a.id SEPARATOR ',') AS community_tag_ids,
 group_concat(a.title SEPARATOR ',') AS community_tags
 FROM tags a JOIN tag_dets b ON a.id = b.tag_id
 WHERE b.doc_id = ${document.id} 
 field column=community_tag_ids name=community_tag_ids/
 field column=community_tags splitBy=, /
   /entity

 /entity

 The value for field community_tags comes correctly as an array of strings.
 However the value of field community_tag_ids is not proper

 arr name=community_tag_ids
 int[B@390c0a18/int
 /arr

 I tried chaining NumberFormatTransformer with formatStyle=number but that
 throws DataImportHandlerException: Failed to apply NumberFormat on column.
 Could it be due to NULL values from database or because the value is not
 proper? How do we handle NULL in this case?


 *Pranav Prakash*

 temet nosce




Can I get DIH skip fields that match empty text nodes

2012-07-18 Thread Alexandre Rafalovitch
Hello,

I have DIH reading an XML file and getting fields with empty values.
My definition is:
field column=title xpath=/database/document/item[@name='Title']/text/

/text here is actual node name, not text() (e.g. item
name='Title'text//item)

Right now, I get the field (of type string) with empty value
indexed/stored/returned. Plus, all the copy fields get the empties as
well.

Can I get DIH to skip that field if I don't have any actual text in
it? I can see how to do it with custom transformer, but it seems that
this would be a common problem and I might just be missing a setting
or some XPath secret.

I actually tried [node()],  [text()] and .../text/text() at the end,
but that seems to make the XPathEntityProcessor skip the field all
together.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Re: Wildcard query vs facet.prefix for autocomplete?

2012-07-18 Thread santamaria2
Very interesting! Thanks for sharing, I'll ponder on it.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995899.html
Sent from the Solr - User mailing list archive at Nabble.com.