date:20120308

Re: Apache Lucene Eurocon 2012

2012-03-08 Thread Vadim Kisselmann

Hi Chris,

thanks for your response.Ok, we will wait :)

Best Regards
Vadim




2012/3/8 Chris Hostetter hossman_luc...@fucit.org


 : where and when is the next Eurocon scheduled?
 : I read something about denmark and autumn 2012(i don't know where *g*).

 I do not know where, but sometime in the fall is probably the correct time
 frame.  I beleive the details will be announced at Lucene Revolution...

http://lucenerevolution.org/

 (that's what happened last year)

 -Hoss

RE: Custom Sharding on solrcloud

2012-03-08 Thread Phil Hoy

Hi,

If I remove the DistributedUpdateProcessorFactory I will have to manage a 
master slave setup myself by updating solely to the master and replicating to 
any slave. I wonder is it possible to have distributed updates but confined to 
the sub-set of cores and replicas within  a collection that share the same name?

Phil

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: 08 March 2012 01:02
To: solr-user@lucene.apache.org
Subject: Re: Custom Sharding on solrcloud

Hi Phil - 

The default update chain now includes the distributed update processor by 
default - and if in solrcloud mode it will be active.

Probably, what you want to do is define your own update chain (see the wiki). 
Then you can add that update chain as the default for your json update handler 
in solrconfig.xml.

 !-- referencing it in an update handler --  requestHandler 
name=/update/json class=solr.JsonUpdateRequestHandler 
   lst name=defaults
 str name=update.chainmychain/str
   /lst
 /requestHandler

The default chain is: 

  new LogUpdateProcessorFactory(),
  new DistributedUpdateProcessorFactory(),
  new RunUpdateProcessorFactory()

So just use Log and Run instead to get your old behavior.

- Mark

On Mar 7, 2012, at 1:37 PM, Phil Hoy wrote:

 Hi,
 
 We have a large index and would like to shard by a particular field value, in 
 our case surname. This way we can scale out to multiple machines, yet as most 
 queries filter on surname we can use some application logic to hit just the 
 one core to get the results we need.
 
 Furthermore as we anticipate the index will grow over time so it make sense 
 (to us) to host a number of shards on a single machine until they get too big 
 at which point we can then move them to another machine.
 
 We are using solrcloud and it is set up using a solrcore per shard, that way 
 we can direct both queries and updates to the appropriate core/shard. To do 
 this our solr.xml looks a bit like this:
 
 cores defaultCoreName=default adminPath=/admin/cores 
 zkClientTimeout=1 hostPort=8983  core shard=default 
 name=aaa-ava instanceDir=/data/recordsets/shards/aaa-ava 
 collection=recordsets /
   core shard=aaa-ava name=aaa-ava 
 instanceDir=/data/recordsets/shards/aaa-ava collection=recordsets /
   core shard=avb-bel name=avb-bel 
 instanceDir=/data/recordsets/shards/avb-bel collection=recordsets /  
   ...
 
 Directed updates via:
 http:/server/solr/aaa-ava/update/json  [{surname:adams}]
 
 Directed queries via:
 http:/server/solr/select?surname:adamsshards=aaa-ava
 
 This setup used to work in version apache-solr-4.0-2011-12-12_09-14-13  
 before the more recent solrcloud changes but now the update is not directed 
 to the appropriate core. Is there a better way to achieve our needs?
 
 Phil
 

- Mark Miller
lucidimagination.com












__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__

Re: indexing cpu utilization

2012-03-08 Thread Gora Mohanty

On 8 March 2012 15:39, gabriel shen xshco...@gmail.com wrote:
 Hi,

 I noticed that, sequential indexing on 1 solr core is only using 40% of our
 8 virtual core CPU power. Why isn't it use 100% of the power? Is there a
 way to increase CPU utilization rate?
[...]

This is an open-ended question which could be due to a
variety of things, and also depends on how you are indexing.
Your indexing process might be I/O bounded (quite possible),
or memory bounded, rather than CPU bounded.

Regards,
Gora

Re: indexing cpu utilization

2012-03-08 Thread gabriel shen

Our indexing process is to adding a bundle of solr documents(for example
5000) to solr each time, and we observed that before commiting(which might
be io bounded) it uses less than half the CPU capacity constantly, which
sounds strange to us why it doesn't use full cpu power. As for RAM, I don't
know how much it will affect CPU utilization, we have assigned 14gb to the
solr tomcat server on a 32 gb linux machine.

best regards,
shen
On Thu, Mar 8, 2012 at 11:27 AM, Gora Mohanty g...@mimirtech.com wrote:

 On 8 March 2012 15:39, gabriel shen xshco...@gmail.com wrote:
  Hi,
 
  I noticed that, sequential indexing on 1 solr core is only using 40% of
 our
  8 virtual core CPU power. Why isn't it use 100% of the power? Is there a
  way to increase CPU utilization rate?
 [...]

 This is an open-ended question which could be due to a
 variety of things, and also depends on how you are indexing.
 Your indexing process might be I/O bounded (quite possible),
 or memory bounded, rather than CPU bounded.

 Regards,
 Gora

Moving from Multiple webapps to Multi Cores -Solr 1.3

2012-03-08 Thread Sujatha Arun

Hello All,

On Protyping from moving from solr Multiple Webapps to Solr Multi Cores
[1.3 Version both]..I am running into the following issues and Questions


1) We are primarily moving to Multicore because ,we saw the Permgen memory
being increased ,each time we created a new solr webapp ,so the assumption
is that  by moving to Multicore  and sharing the same war file ,we will not
increase the permgen memory ,when we create a new core  ,but I do see about
190kb increase when a new core is created as opposed to about 13mb per new
webapp , does the permgen memory get consumed /increased per core creation
with some benefit over webapp creation?


2) We have schemas for multiple languages ,and I wanted to create  webapp
per language and create cores  for each client with same kang requirement
,with shared schema ,Would that affect if we want to add some dynamic
fields to some cores [ofcourse the indexes are separate] ? Does this
approach make sense or we can just create  n number of cores in a single
webapp with different schemas ?


3) In terms of query time ,when i query a webapp to a  particular core
,should I expect the Qtime come down or remain same?


4) on Using the create command as
multi_core_prototype/admin/cores?action=CREATEname=coreXinstanceDir=/searchinstances/multi_core_prototype/solr/coreXconfig=/searchinstances/multi_core_prototype/solr/coreXschema=/searchinstances/multi_core_prototype/solr/core0/conf/schema.xmldataDir/searchinstances/multi_core_prototype/solr/coreX/data

My Directory structure is

tomcat5.5
Searchinstances
...multi_core_prototype
   ...solr.war
..solr
   .. solr.xml
   .. core0
   ...data
..conf
   ..core1
..conf
..data

On the above command instance dir ,coreX  is created  under solr  and  data
directory under  coreX ,however I dont see a conf directory with schema and
Solrconfig under  CoreX,I am assuming with the above command it copies it
from the existing core0 conf folder

Let me know if I am missing anything here.

Thanks,
 Sujatha

Re: How to exactly match fields which are multi-valued?

2012-03-08 Thread Erick Erickson

You haven't really given us much to go on here. Matches
are just like a single valued field with the exception of
the increment gap. Say one entry were
large cat big dog
in a multi-valued field. ay the next document
indexed two values,
large cat
big dog

And, say the increment gap were 100. The token offsets
for doc 1 would be
0, 1, 2, 3
and for doc 2 would be
0, 1, 101, 102

The only effective difference is that phrase queries with slop
less than 100 would NEVER match across multi-values. I.e.
cat big~10 would match doc1 but not doc 2

Best
Erick

2012/3/7 SuoNayi suonayi2...@163.com:
 Hi all, how to offer exact-match capabilities on the multi-valued fields?

 Any helps are appreciated!

 SuoNayi

Re: Question about Streaming Update Solr Server

2012-03-08 Thread Anderson vasconcelos

Anyone could reply this questions?

Thanks

2012/3/5 Anderson vasconcelos anderson.v...@gmail.com

 Hi

 I have some questions about StreamingUpdateSolrServer.

 1)What's queue size parameter? It's the number of documents in each
 thread?

 2)When i configurated  like this StreamingUpdateSolrServer(URL, 1000, 5)
 indexing runs ok. But when i up the number of threads like this new
 StreamingUpdateSolrServer(URL, 1000, 15)  i received a
 java.net.SocketException: Broken pipe. Why?

 3)When i indexing using addBean method,  they open the max of threads than
 i configured. But when i use addBeans, they open only one thread. Is this
 correct?


 Thanks

Re: solr geospatial / spatial4j

2012-03-08 Thread Erick Erickson

Yes, there are trunk nightly builds, see:
https://builds.apache.org//view/S-Z/view/Solr/job/Solr-trunk/

But I don't think LSP is in trunk at this point, so that's not useful. The
code branch is on (I think)
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_3795_ls_spatial_playground
but I confess I haven't tried to get and build it all, I'm not quite sure what's
needed
Best
Erick

On Wed, Mar 7, 2012 at 10:25 AM, Matt Mitchell goodie...@gmail.com wrote:
 Hi,

 I'm researching options for handling a better geospatial solution. I'm
 currently using Solr 3.5 for a read-only database, and the
 point/radius searches work great. But I'd like to start doing point in
 polygon searches as well. I've skimmed through some of the geospatial
 jira issues, and read about spaitial4j, which is very interesting. I
 see on the github page that this will soon be part of lucene, can
 anyone confirm this?

 I attempted to build the spatial4j demo but no luck. It had problems
 finding lucene 4.0-SNAPSHOT, which I guess is because there are no
 4.0-SNAPSHOT nightly builds? If anyone knows how I can get around
 this, please let me know!

 Other than spatial4j, is there a way to do point in polgyon searches
 with solr 3.5.0 right now? Is there some tricky indexing/querying
 strategy that would allow this?

 Thanks!

 - Matt

Re: How to limit the number of open searchers?

2012-03-08 Thread Erick Erickson

Ah, you're right. If you queries run across several commits
you'll get multiple searchers open.

I don't know of any good way to do what you want.

I'm curious, why can't you do a master/slave setup?

The other thing to think about would be the NRT stuff
if you can run trunk.

Best
Erick

On Wed, Mar 7, 2012 at 2:30 PM, Michael Ryan mr...@moreover.com wrote:
 Unless you have warming happening, there should
 only be a single searcher open at any given time.
 So it seems to me that maxWarmingSearchers
 should give you what you need.

 What I'm seeing is that if a query takes a very long time to run, and runs 
 across the duration of multiple commits (I know, that itself sounds bad!), I 
 can get into a situation where I have 2 searchers in use and 1 searcher 
 warming, rather than 1 searcher in use and 1 searcher warming. Due to all the 
 memory-intensive features I use, having 3 or more searchers open can cause an 
 OutOfMemoryError.

 I'm not using master/slave for this application, so can't go that route.

 I'd like a way to see how many searchers are currently open that is external 
 to Solr. This would allow me to block my commits until I see that there is 
 only 1 searcher currently open. I could use JMX, but that feels like overkill 
 - wondering if there is something simpler.

 -Michael

Re: indexing cpu utilization

2012-03-08 Thread Tanguy Moal


How are you sending documents to solr ?

If you push solr input documents via HTTP (which is what SolrJ does), 
you could increase CPU consumption (and therefor reduce indexing time) 
by sending your update requests asynchronously, using multiple updating 
threads, to your single solr core.


Somebody more familiar than me with the Update chain could probably tell 
you more, but I think each update request is treated inside a single 
thread on the server side.
If that's correct, then you can increase CPU consumption on your 
indexing host by adding more updating threads (to the client pushing 
documents to your solr core)


Also make sure you don't ask solr to commit your pending changes to solr 
index too frequently (on each add), but only when you want changes to be 
taken into account on the searching side.


I personnaly like to let solr do autoCommits, using a combo of max added 
documents and elapsed time conditions for the auto commit policy.


Considering indexing bottlenecks more generally, my experience in that 
field, is that indexing speed is usually bound to, in frequency order :
- source enumeration speed (especially if solr input documents are made 
out of complex joins on a remote DB)
- Network IO if performing remote indexing and the network link isn't 
adapted to amount of data running through it
- Disk IO if you commit very often and rely on commodity SATAs HDDs, or 
if another process is stressing the poor little device (keep the 150 
IOPS limit in mind for sata devices)

- CPU if were able to get rid of previous bottlenecks

- Memory isn't playing the same role in indexing speed than other 
factors, because from my point of view it would only be a limit if you 
perform complex analysis on many many fields, and if that becomes a 
problem, then it becomes easy to spot with JMX and JConsole because your 
JVM would then be performing many GCs, and the process's resident RAM 
usage will be close to whatever was set to -Xmx .


I don't know if I was really clear, all I can say is that increasing the 
number of clients pushing updates to solr in parrallel was the easiest 
for me to reduce the indexing time for large update batches.


Hope this helps,

--
Tanguy

Le 08/03/2012 11:48, gabriel shen a écrit :

Our indexing process is to adding a bundle of solr documents(for example
5000) to solr each time, and we observed that before commiting(which might
be io bounded) it uses less than half the CPU capacity constantly, which
sounds strange to us why it doesn't use full cpu power. As for RAM, I don't
know how much it will affect CPU utilization, we have assigned 14gb to the
solr tomcat server on a 32 gb linux machine.

best regards,
shen
On Thu, Mar 8, 2012 at 11:27 AM, Gora Mohantyg...@mimirtech.com  wrote:


On 8 March 2012 15:39, gabriel shenxshco...@gmail.com  wrote:

Hi,

I noticed that, sequential indexing on 1 solr core is only using 40% of

our

8 virtual core CPU power. Why isn't it use 100% of the power? Is there a
way to increase CPU utilization rate?

[...]

This is an open-ended question which could be due to a
variety of things, and also depends on how you are indexing.
Your indexing process might be I/O bounded (quite possible),
or memory bounded, rather than CPU bounded.

Regards,
Gora

Re: Stemmer Question

2012-03-08 Thread Ahmet Arslan

 I was previously using the
 PorterStemmer to do stemming and ran into
 an issue where it was overly aggressive with some words or
 abbreviations which I needed to stop.  I have recently
 switched to
 KStem and I believe the issue is less, but I was wondering
 still if
 there was a way to set a number of stop words for which you
 didn't
 want stemming to occur or if there was a way to tell the
 Stemmer to
 store the unstemmed version as well.  So for instance
 if a query came
 in for Ahmed, the PorterStemmer would turn that into Ahm,
 while in
 this case Ahmed is a name and I want to search that
 unstemmed.  If
 there was a stop word list I could attempt to compile a list
 of words
 I didn't want stem or if there was a way to say also say
 create a
 token for the unstemmed word so what went into the index for
 Ahmed
 would be ahmed ahm so we'd cover both cases.  What
 are the draw
 backs of providing both.

StemmerOverrideFilterFactory and KeywordMarkerFilterFactory are used for these 
kind of purposes. 
http://wiki.apache.org/solr/LanguageAnalysis#Customizing_Stemming

Re: indexing cpu utilization

2012-03-08 Thread Gora Mohanty

On 8 March 2012 16:18, gabriel shen xshco...@gmail.com wrote:
 Our indexing process is to adding a bundle of solr documents(for example
 5000) to solr each time, and we observed that before commiting(which might
 be io bounded) it uses less than half the CPU capacity constantly, which
 sounds strange to us why it doesn't use full cpu power. As for RAM, I don't
 know how much it will affect CPU utilization, we have assigned 14gb to the
 solr tomcat server on a 32 gb linux machine.
[...]

Are you hitting memory limits?

As Tanguy has already pointed out in nice detail, it probably
also does matter how you push documents to Solr, and how
often you commit.

In an apples-to-oranges comparison, we used to run a
large indexing task, but with only a single commit at the
end, while it sounds as if you are using smaller batches,
with more frequent commits. In our case, we could max
out CPU usage (well, we backed off at ~85% utilisation
on each core). Though we were fetching data over the
network, it was a relatively high-bandwidth internal connection,
and we were using DIH with multiple Solr cores.

Regards,
Gora

Understanding update handler statistics

2012-03-08 Thread stetogias

Hi,

Trying to understand the update handler statistics
so I have this:

commits : 2824
autocommit maxDocs : 1
autocommit maxTime : 1000ms
autocommits : 41
optimizes : 822
rollbacks : 0
expungeDeletes : 0
docsPending : 0
adds : 0
deletesById : 0
deletesByQuery : 0
errors : 0
cumulative_adds : 17457
cumulative_deletesById : 1959
cumulative_deletesByQuery : 0
cumulative_errors : 0 

my problem is with the cumulative part.

If for instance I am doing a commit after each add and delete operation then
the sum of cumulative_adds plus 
cumulative_deletes plus cumulative_errors should much the commit number.
is that right?
And another question, these stats are since SOLR instance startup or since
update handler startup, these
can differ as far as I understand...

and from this part:
docsPending : 0
adds : 0
deletesById : 0
deletesByQuery : 0
errors : 0

I understand that if I had docsPending I should have adds(pending)
deletes*(pending) but how could I have errors...

thanks
stelios


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-update-handler-statistics-tp3809743p3809743.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: wildcard queries with edismax and lucene query parsers

2012-03-08 Thread Robert Stewart

Any help on this?  I am really stuck on a client project.  I need to
know how scoring works with wildcard queries under SOLR 3.2.

Thanks
Bob

On Mon, Mar 5, 2012 at 4:22 PM, Robert Stewart bstewart...@gmail.com wrote:
 How is scoring affected by wildcard queries?  Seems when I use a
 wildcard query I get all constant scores in response (all scores =
 1.0).  That occurs with both edismax as well as lucene query parser.
 I am trying to implement auto-suggest feature so I need to use wild
 card to return all results that match the prefix entered by a user.
 But I want the results sorted according to score defined by the qf
 parameter in my search handler.

 ?defType=edismaxq=grow*fl=title,score

 result name=response numFound=11 start=0 maxScore=1.0
 doc
 float name=score1.0/float
 arr name=title
 strSP 1000 Growth/str
 /arr
 /doc
 doc
 float name=score1.0/float
 arr name=title
 strSP 1000 Pure Growth/str
 /arr
 /doc


 ?defType=luceneq=grow*fl=title,score

 result name=response numFound=11 start=0 maxScore=1.0
 doc
 float name=score1.0/float
 arr name=title
 strSP 1000 Growth/str
 /arr
 /doc
 doc
 float name=score1.0/float
 arr name=title
 strSP 1000 Pure Growth/str
 /arr
 /doc

 If I use query with no wildcard, scoring appears correct:

 ?defType=edismaxq=growthfl=title,score

 result name=response numFound=11 start=0 maxScore=0.7500377
 doc
 float name=score0.7500377/float
 arr name=title
 strSP 1000 Growth/str
 /arr
 /doc
 doc
 float name=score0.7500377/float
 arr name=title
 strSP 500 Growth/str
 /arr
 /doc
 doc
 float name=score0.656283/float
 arr name=title
 strSP 1000 Pure Growth/str
 /arr
 /doc

 I am using SOLR version 3.2 and using a request handler defined like this:

 requestHandler name=/idxsuggest class=solr.SearchHandler
       lst name=defaults
         str name=echoParamsexplicit/str
         int name=rows10/int
         str name=defTypeedismax/str
          str name=q*:*/str
         str name=qf
                  ticker^10.0 indexCode^10.0 indexKey^10.0 title^5.0
 indexName^5.0
         /str
          str 
 name=flindexId,indexName,indexCode,indexKey,title,ticker,urlTitle/str
      /lst
  lst name=appends
          !-- Filter out documents that are not published yet and
 that are not yet expired --
          str name=fq+contentType:IndexProfile/str
       /lst
   /requestHandler

Re: wildcard queries with edismax and lucene query parsers

2012-03-08 Thread Ahmet Arslan

WildcardQueries are wrapped into ConstantScoreQuery.

I would create a copy field of these fields using the following field type.

Then you can search on these copyFields (qf). With this approach you don't need 
to use start operator. defType=edismaxq=growfl=title,score

fieldType name=prefix_token class=solr.TextField positionIncrementGap=1
analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mappings.txt/
tokenizer class=solr.WhitespaceTokenizerFactory / 
filter class=solr.LowerCaseFilterFactory / 
filter class=solr.SynonymFilterFactory 
synonyms=synonyms_index.txt ignoreCase=true expand=true / 
filter class=solr.EdgeNGramFilterFactory 
minGramSize=1 maxGramSize=20 / 
/analyzer
analyzer type=query
charFilter class=solr.MappingCharFilterFactory 
mapping=mappings.txt/
tokenizer class=solr.WhitespaceTokenizerFactory / 
filter class=solr.LowerCaseFilterFactory / 
/analyzer
/fieldType


--- On Thu, 3/8/12, Robert Stewart bstewart...@gmail.com wrote:

 From: Robert Stewart bstewart...@gmail.com
 Subject: Re: wildcard queries with edismax and lucene query parsers
 To: solr-user@lucene.apache.org
 Date: Thursday, March 8, 2012, 4:21 PM
 Any help on this?  I am really
 stuck on a client project.  I need to
 know how scoring works with wildcard queries under SOLR
 3.2.
 
 Thanks
 Bob
 
 On Mon, Mar 5, 2012 at 4:22 PM, Robert Stewart bstewart...@gmail.com
 wrote:
  How is scoring affected by wildcard queries?  Seems
 when I use a
  wildcard query I get all constant scores in response
 (all scores =
  1.0).  That occurs with both edismax as well as lucene
 query parser.
  I am trying to implement auto-suggest feature so I need
 to use wild
  card to return all results that match the prefix
 entered by a user.
  But I want the results sorted according to score
 defined by the qf
  parameter in my search handler.
 
  ?defType=edismaxq=grow*fl=title,score
 
  result name=response numFound=11 start=0
 maxScore=1.0
  doc
  float name=score1.0/float
  arr name=title
  strSP 1000 Growth/str
  /arr
  /doc
  doc
  float name=score1.0/float
  arr name=title
  strSP 1000 Pure Growth/str
  /arr
  /doc
 
 
  ?defType=luceneq=grow*fl=title,score
 
  result name=response numFound=11 start=0
 maxScore=1.0
  doc
  float name=score1.0/float
  arr name=title
  strSP 1000 Growth/str
  /arr
  /doc
  doc
  float name=score1.0/float
  arr name=title
  strSP 1000 Pure Growth/str
  /arr
  /doc
 
  If I use query with no wildcard, scoring appears
 correct:
 
  ?defType=edismaxq=growthfl=title,score
 
  result name=response numFound=11 start=0
 maxScore=0.7500377
  doc
  float name=score0.7500377/float
  arr name=title
  strSP 1000 Growth/str
  /arr
  /doc
  doc
  float name=score0.7500377/float
  arr name=title
  strSP 500 Growth/str
  /arr
  /doc
  doc
  float name=score0.656283/float
  arr name=title
  strSP 1000 Pure Growth/str
  /arr
  /doc
 
  I am using SOLR version 3.2 and using a request handler
 defined like this:
 
  requestHandler name=/idxsuggest
 class=solr.SearchHandler
        lst name=defaults
          str
 name=echoParamsexplicit/str
          int name=rows10/int
          str
 name=defTypeedismax/str
           str name=q*:*/str
          str name=qf
                   ticker^10.0 indexCode^10.0
 indexKey^10.0 title^5.0
  indexName^5.0
          /str
           str
 name=flindexId,indexName,indexCode,indexKey,title,ticker,urlTitle/str
       /lst
   lst name=appends
           !-- Filter out documents that are not
 published yet and
  that are not yet expired --
           str
 name=fq+contentType:IndexProfile/str
        /lst
    /requestHandler

Re: Understanding update handler statistics

2012-03-08 Thread Shawn Heisey


On 3/8/2012 7:02 AM, stetogias wrote:

Hi,

Trying to understand the update handler statistics
so I have this:

commits : 2824
autocommit maxDocs : 1
autocommit maxTime : 1000ms
autocommits : 41
optimizes : 822
rollbacks : 0
expungeDeletes : 0
docsPending : 0
adds : 0
deletesById : 0
deletesByQuery : 0
errors : 0
cumulative_adds : 17457
cumulative_deletesById : 1959
cumulative_deletesByQuery : 0
cumulative_errors : 0

my problem is with the cumulative part.

If for instance I am doing a commit after each add and delete operation then
the sum of cumulative_adds plus
cumulative_deletes plus cumulative_errors should much the commit number.
is that right?
And another question, these stats are since SOLR instance startup or since
update handler startup, these
can differ as far as I understand...

and from this part:
docsPending : 0
adds : 0
deletesById : 0
deletesByQuery : 0
errors : 0

I understand that if I had docsPending I should have adds(pending)
deletes*(pending) but how could I have errors...


I'm fairly sure that adds and deletes refer to the number of documents 
added or deleted.  You can have many documents added and/or deleted for 
each commit.  I would not expect the sums to match, unless you are 
adding or deleting only one document at a time and doing a commit after 
every one.  I hope you're not doing that, unless you're using trunk with 
the near-realtime feature and doing soft commits, with which I have no 
experience.  Normally doing a commit after every document would be too 
much of a load for good performance, unless there is a relatively long 
time period between each add or delete.


Your question about errors - that probably tracks the number of times 
that the update handler returned an error response, though I don't 
really know.  If I'm right, then that number, like commits, has little 
to do with the number of documents.


Thanks,
Shawn

Re: wildcard queries with edismax and lucene query parsers

2012-03-08 Thread Robert Stewart

Ahmet,

That is a great idea.  I will try it.

Thank you.

On Thu, Mar 8, 2012 at 9:34 AM, Ahmet Arslan iori...@yahoo.com wrote:
 WildcardQueries are wrapped into ConstantScoreQuery.

 I would create a copy field of these fields using the following field type.

 Then you can search on these copyFields (qf). With this approach you don't 
 need to use start operator. defType=edismaxq=growfl=title,score

 fieldType name=prefix_token class=solr.TextField 
 positionIncrementGap=1
                analyzer type=index
                        charFilter class=solr.MappingCharFilterFactory 
 mapping=mappings.txt/
                        tokenizer class=solr.WhitespaceTokenizerFactory /
                        filter class=solr.LowerCaseFilterFactory /
                        filter class=solr.SynonymFilterFactory 
 synonyms=synonyms_index.txt ignoreCase=true expand=true /
                        filter class=solr.EdgeNGramFilterFactory 
 minGramSize=1 maxGramSize=20 /
                /analyzer
                analyzer type=query
                        charFilter class=solr.MappingCharFilterFactory 
 mapping=mappings.txt/
                        tokenizer class=solr.WhitespaceTokenizerFactory /
                        filter class=solr.LowerCaseFilterFactory /
                /analyzer
        /fieldType


 --- On Thu, 3/8/12, Robert Stewart bstewart...@gmail.com wrote:

 From: Robert Stewart bstewart...@gmail.com
 Subject: Re: wildcard queries with edismax and lucene query parsers
 To: solr-user@lucene.apache.org
 Date: Thursday, March 8, 2012, 4:21 PM
 Any help on this?  I am really
 stuck on a client project.  I need to
 know how scoring works with wildcard queries under SOLR
 3.2.

 Thanks
 Bob

 On Mon, Mar 5, 2012 at 4:22 PM, Robert Stewart bstewart...@gmail.com
 wrote:
  How is scoring affected by wildcard queries?  Seems
 when I use a
  wildcard query I get all constant scores in response
 (all scores =
  1.0).  That occurs with both edismax as well as lucene
 query parser.
  I am trying to implement auto-suggest feature so I need
 to use wild
  card to return all results that match the prefix
 entered by a user.
  But I want the results sorted according to score
 defined by the qf
  parameter in my search handler.
 
  ?defType=edismaxq=grow*fl=title,score
 
  result name=response numFound=11 start=0
 maxScore=1.0
  doc
  float name=score1.0/float
  arr name=title
  strSP 1000 Growth/str
  /arr
  /doc
  doc
  float name=score1.0/float
  arr name=title
  strSP 1000 Pure Growth/str
  /arr
  /doc
 
 
  ?defType=luceneq=grow*fl=title,score
 
  result name=response numFound=11 start=0
 maxScore=1.0
  doc
  float name=score1.0/float
  arr name=title
  strSP 1000 Growth/str
  /arr
  /doc
  doc
  float name=score1.0/float
  arr name=title
  strSP 1000 Pure Growth/str
  /arr
  /doc
 
  If I use query with no wildcard, scoring appears
 correct:
 
  ?defType=edismaxq=growthfl=title,score
 
  result name=response numFound=11 start=0
 maxScore=0.7500377
  doc
  float name=score0.7500377/float
  arr name=title
  strSP 1000 Growth/str
  /arr
  /doc
  doc
  float name=score0.7500377/float
  arr name=title
  strSP 500 Growth/str
  /arr
  /doc
  doc
  float name=score0.656283/float
  arr name=title
  strSP 1000 Pure Growth/str
  /arr
  /doc
 
  I am using SOLR version 3.2 and using a request handler
 defined like this:
 
  requestHandler name=/idxsuggest
 class=solr.SearchHandler
        lst name=defaults
          str
 name=echoParamsexplicit/str
          int name=rows10/int
          str
 name=defTypeedismax/str
           str name=q*:*/str
          str name=qf
                   ticker^10.0 indexCode^10.0
 indexKey^10.0 title^5.0
  indexName^5.0
          /str
           str
 name=flindexId,indexName,indexCode,indexKey,title,ticker,urlTitle/str
       /lst
   lst name=appends
           !-- Filter out documents that are not
 published yet and
  that are not yet expired --
           str
 name=fq+contentType:IndexProfile/str
        /lst
    /requestHandler

Re: Stemmer Question

2012-03-08 Thread Jamie Johnson

Thanks the KeywordMarkerFilterFactory seems to be what I was looking
for.  I'm still wondering about keeping the unstemmed word as a token
though.  While I know that this would increase the index size slightly
I wonder what the negative of doing such a thing would be?  Just seems
less destructive since I always store the unstemmed version and the
stemmed version.  By not storing the unstemmed version there is no way
to go back without reindexing. If I wanted to implement this I'm
assuming a custom tokenizer would be most appropriate?  Does something
like this already exist?

On Thu, Mar 8, 2012 at 8:36 AM, Ahmet Arslan iori...@yahoo.com wrote:
 I was previously using the
 PorterStemmer to do stemming and ran into
 an issue where it was overly aggressive with some words or
 abbreviations which I needed to stop.  I have recently
 switched to
 KStem and I believe the issue is less, but I was wondering
 still if
 there was a way to set a number of stop words for which you
 didn't
 want stemming to occur or if there was a way to tell the
 Stemmer to
 store the unstemmed version as well.  So for instance
 if a query came
 in for Ahmed, the PorterStemmer would turn that into Ahm,
 while in
 this case Ahmed is a name and I want to search that
 unstemmed.  If
 there was a stop word list I could attempt to compile a list
 of words
 I didn't want stem or if there was a way to say also say
 create a
 token for the unstemmed word so what went into the index for
 Ahmed
 would be ahmed ahm so we'd cover both cases.  What
 are the draw
 backs of providing both.

 StemmerOverrideFilterFactory and KeywordMarkerFilterFactory are used for 
 these kind of purposes.
 http://wiki.apache.org/solr/LanguageAnalysis#Customizing_Stemming

Importing dynamicField data on the fly

2012-03-08 Thread Mark Beeby

Hello Everyone, 

I'm trying to work out how, if at all possible, dynamicFields can be 
imported from a dynamic data source through the DataImportHandler 
configurations. Currently the DataImportHandler configuration file 
requires me to name every single field I want to map in advance, but I do 
not know the dynamicField set at this stage necessarily. 

Here's my example schema.xml dynamic field definition:

dynamicField name=*_sortable  type=alphaOnlySort  indexed=true 
stored=true/

My DataImportHandler import configuration file looks like this:

dataSource name=Gateway1Source type=HttpDataSource baseUrl=
http://acproplatforms.internal/feeds.xml; encoding=UTF-8 
connectionTimeout=15000 readTimeout=15000/
document name=feeds
entity name=feed processor=XPathEntityProcessor 
stream=true forEach=/gateway/feedItem/ url=
field column=type xpath=/gateway/feedItem/type/
...
/entity
/document
/dataConfig

I have looked, very optimistically, at Script Transformers 
(transformer=script:importDynamics), specifically hoping the row in the 
transformer function would hold the dynamic field content, but this was 
silly thinking obviously, as they would already fall through had they made 
it into here. 

Has anyone managed to import into dynamic fields in advance of knowing 
what they were going to be in the data source?

To give you an idea of why I want this, there's an application aggregating 
web services from many sources, some of which contain patterns of fields I 
know we'll want, and the nature of their data types, but which are added 
to quite frequently. It seems aside from the field mappings here, the hard 
work has been done in Solr to achieve this!

Kindest Regards,
Mark 




From:   Shawn Heisey s...@elyograg.org
To: solr-user@lucene.apache.org, 
Date:   08/03/2012 14:58
Subject:Re: Understanding update handler statistics



On 3/8/2012 7:02 AM, stetogias wrote:
 Hi,

 Trying to understand the update handler statistics
 so I have this:

 commits : 2824
 autocommit maxDocs : 1
 autocommit maxTime : 1000ms
 autocommits : 41
 optimizes : 822
 rollbacks : 0
 expungeDeletes : 0
 docsPending : 0
 adds : 0
 deletesById : 0
 deletesByQuery : 0
 errors : 0
 cumulative_adds : 17457
 cumulative_deletesById : 1959
 cumulative_deletesByQuery : 0
 cumulative_errors : 0

 my problem is with the cumulative part.

 If for instance I am doing a commit after each add and delete operation 
then
 the sum of cumulative_adds plus
 cumulative_deletes plus cumulative_errors should much the commit number.
 is that right?
 And another question, these stats are since SOLR instance startup or 
since
 update handler startup, these
 can differ as far as I understand...

 and from this part:
 docsPending : 0
 adds : 0
 deletesById : 0
 deletesByQuery : 0
 errors : 0

 I understand that if I had docsPending I should have adds(pending)
 deletes*(pending) but how could I have errors...

I'm fairly sure that adds and deletes refer to the number of documents 
added or deleted.  You can have many documents added and/or deleted for 
each commit.  I would not expect the sums to match, unless you are 
adding or deleting only one document at a time and doing a commit after 
every one.  I hope you're not doing that, unless you're using trunk with 
the near-realtime feature and doing soft commits, with which I have no 
experience.  Normally doing a commit after every document would be too 
much of a load for good performance, unless there is a relatively long 
time period between each add or delete.

Your question about errors - that probably tracks the number of times 
that the update handler returned an error response, though I don't 
really know.  If I'm right, then that number, like commits, has little 
to do with the number of documents.

Thanks,
Shawn

Re: Stemmer Question

2012-03-08 Thread Ahmet Arslan

 Thanks the KeywordMarkerFilterFactory
 seems to be what I was looking
 for.  I'm still wondering about keeping the unstemmed
 word as a token
 though.  While I know that this would increase the
 index size slightly
 I wonder what the negative of doing such a thing would
 be?  Just seems
 less destructive since I always store the unstemmed version
 and the
 stemmed version.  By not storing the unstemmed version
 there is no way
 to go back without reindexing. If I wanted to implement this
 I'm
 assuming a custom tokenizer would be most appropriate? 
 Does something
 like this already exist?

Not out-of-the-box. Actually I was using your idea, implemented such custom 
token filter by mixing synonym filter and stem filter. This is useful for 
wildcard queries. And for normal queries, this could rank exact matches higher.

Re: solr geospatial / spatial4j

2012-03-08 Thread Ryan McKinley

On Wed, Mar 7, 2012 at 7:25 AM, Matt Mitchell goodie...@gmail.com wrote:
 Hi,

 I'm researching options for handling a better geospatial solution. I'm
 currently using Solr 3.5 for a read-only database, and the
 point/radius searches work great. But I'd like to start doing point in
 polygon searches as well. I've skimmed through some of the geospatial
 jira issues, and read about spaitial4j, which is very interesting. I
 see on the github page that this will soon be part of lucene, can
 anyone confirm this?

perhaps -- see the discussion on:
https://issues.apache.org/jira/browse/LUCENE-3795

This will involve a few steps before it is actually integrated with
the lucene project -- and then a few more to be usable from solr


 I attempted to build the spatial4j demo but no luck. It had problems
 finding lucene 4.0-SNAPSHOT, which I guess is because there are no
 4.0-SNAPSHOT nightly builds? If anyone knows how I can get around
 this, please let me know!


ya they are published -- you just have to specify where you want to
pull them from.  If you use the 'updateLucene' profile, it will pull
them from:  https://repository.apache.org/content/groups/snapshots/

use:  mvn clean install -P updateLucene


 Other than spatial4j, is there a way to do point in polgyon searches
 with solr 3.5.0 right now? Is there some tricky indexing/querying
 strategy that would allow this?


I don't know of anything else -- and note that polygon stuff has a
ways to go before it is generally ready for prime-time.

ryan

Re: Solr-Lucene compatibility

2012-03-08 Thread Chris Hostetter


: I have an app the writes lucene indexes and is based on lucene 2.3.0.
: 
: Can I read those indexes using solr 3.5.0 and perform a distributed search?
: Or should I use a lower version of solr, so that the index reader is
: compatible with the index writer.

a) Lucene 2.3.0 is pretty damn ancient ... i would strongly recomend you 
upgrade to get a lot of bug fixes and performance improvements

b) in general, writing indexes with Lucene and searching them with (a 
compatible version of) Solr should work fine -- provided the schema.xml 
you configure Solr with matches up with how you've built your index.

-Hoss

Re: How to exactly match fields which are multi-valued?

2012-03-08 Thread Jonathan Rochkind

Well, if you really want EXACT exact, just use a KeywordTokenizer (ie, 
not tokenize at all). But then matches will really have to be EXACT, 
including punctuation, whitespace, diacritics, etc.  But a query will 
only match if it 'exactly' matches one value in your multi-valued field.


You could try a KeywordTokenizer with some normalization too.

Either way, though, if you're issuing a query to a field tokenized with 
KeywordTokenizer that can include whitespace in it's values, you really 
need to issue it as a _phrase query_, to avoid being messed up by the 
lucene or dismax query parser's pre tokenization.  Which is 
potentially fine, that's what you want to do anyway for 'exact match'.  
Except if you wanted to use dismax multiple qf's with just a BOOST on 
the 'exact match', but _not_ a phrase query for other fields... well, I 
can't figure out any way to do it with this technique.


It gets tricky, I haven't found a great solution.

On 3/8/2012 7:44 AM, Erick Erickson wrote:

You haven't really given us much to go on here. Matches
are just like a single valued field with the exception of
the increment gap. Say one entry were
large cat big dog
in a multi-valued field. ay the next document
indexed two values,
large cat
big dog

And, say the increment gap were 100. The token offsets
for doc 1 would be
0, 1, 2, 3
and for doc 2 would be
0, 1, 101, 102

The only effective difference is that phrase queries with slop
less than 100 would NEVER match across multi-values. I.e.
cat big~10 would match doc1 but not doc 2

Best
Erick

2012/3/7 SuoNayisuonayi2...@163.com:

Hi all, how to offer exact-match capabilities on the multi-valued fields?

Any helps are appreciated!

SuoNayi

Re: Filter facet_fields with Solr similar to stopwords

2012-03-08 Thread Chris Hostetter


: I am using a solr.StopFilterFactory in a query filter for a text_general
: field (here: content). It works fine, when I query the field for the
: stopword, then I am getting no results.
...
: used in the text. What I am trying to achieve is, to also filter the
: stopwords from the facet_fields, but it's not working. It would only work
: if the stopwords are also used during the indexing of the text_general
: field, right?
...
: My current solution is to 'filter' with code after retrieving the
: facet_fields from Solr. But is there a Solr-based way to do this niftier?

Not really. field.facet works based on the terms in the index -- if the 
term is in the index, and it's in the documents matching your query, you 
are going to get counts back for it.




-Hoss

Re: Retrieving multiple levels with hierarchical faceting in Solr

2012-03-08 Thread Chris Hostetter


: I've found a couple of discussions online that suggest I ought to be 
: able to set the prefix using local params:
: 
: facet.field={!prefix=0;}foo
: facet.field={!prefix=1_foovalue; key=bar}foo

citation please?

as far as i know that has ever been implemented, but the idea was floated 
arround as a hypothetical.

There is an open feature request for this type of logic, and it has a 
patch, but that patch doesn't work against any recent version 
(contributions to get it up to snuff would certianly be welcome)...

https://issues.apache.org/jira/browse/SOLR-1351
https://issues.apache.org/jira/browse/SOLR-2251


-Hoss

Re: maxClauseCount Exception

2012-03-08 Thread Chris Hostetter


:   I am suddenly getting a maxClauseCount exception for no reason. I am
: using Solr 3.5. I have only 206 documents in my index.

Unless things have changed the reason you are seeing this is because 
_highlighting_ a query (clause) like type_s:[*+TO+*] requires rewriting 
it into a giant boolean query of all the terms in that field -- so even if 
you only have 206 docs, if you have more then 206 values in that field in 
your index, you're going to go over 1024 terms.

(you don't get this problem in a basic query, because it doens't need to 
enumerate all the terms, it rewrites it to a ConstantScoreQuery)

what you most likeley want to do, is move some of those clauses like 
type_s:[*+TO+*]: and usergroup_sm:admin) out of your main q query and 
into fq filters ... so they can be cached independently, won't 
contribute to scoring (just matching) and won't be used in highlighting.

: 
params={hl=truehl.snippets=4hl.simple.pre=b/bfl=*,scorehl.mergeContiguous=truehl.usePhraseHighlighter=truehl.requireFieldMatch=trueechoParams=allhl.fl=text_tq={!lucene+q.op%3DOR+df%3Dtext_t}+(+kind_s:doc+OR+kind_s:xml)+AND+(type_s:[*+TO+*])+AND+(usergroup_sm:admin)rows=20start=0wt=javabinversion=2}
 hits=204 status=500 QTime=166 |#]

: [#|2012-02-22T13:40:13.131-0500|SEVERE|glassfish3.1.1|
: org.apache.solr.servlet.SolrDispatchFilter|
: _ThreadID=22;_ThreadName=Thread-2;|org.apache.lucene.search.BooleanQuery
: $TooManyClauses: maxClauseCount is set to 1024
:   at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:136)
...
:   at
: org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:304)
:   at
: 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:158)

-Hoss

Re:two solr instances using one index

2012-03-08 Thread C.Yunqin

the two solr instances are used to provide a failover.  can i  define the 
priority of the two instances? 

-- Original --
From:  我自己的邮箱345804...@qq.com;
Date:  Thu, Mar 8, 2012 02:05 PM
To:  solr-usersolr-user@lucene.apache.org; 

Subject:  two solr instances using one index

Hi, everyone

  2 solr server nodes point  to  the same data directory (same index).  did 
the two solr instances work independently ? 
i found it was strange : one node (node0) can do complex search(for 
example:q:diseasesort=dateCreated), but the other(node1) using the same 
search reported out o f memory.  (the java -Xmx4G  is enough)

and i tried to start node1 first after we kill  node0  (if i kept node0 running 
, i can never start node1 without heap size error! Which will impact the 
performance of node1 to perform complex search) , any complex search can 
complete well.

did anybody meet the problem ever and any ideal about it ? ps: my solr version 
is 1.3

Re: indexing bigdata

2012-03-08 Thread Erick Erickson

Your question is really unanswerable, there are about a zillion
factors that could influence the answer. I can index 5-7K docs/second
so it's efficient. Others can index only a fraction of that. It all depends...

Try it and see is about the only way to answer.

Best
Erick

On Thu, Mar 8, 2012 at 1:35 PM, Sharath Jagannath
shotsonclo...@gmail.com wrote:
 Is indexing around 30 Million documents in a single solr instance efficient?
 Has somebody experimented it? Planning to use it for an autosuggest feature
 I am implementing, so expecting the response in few milliseconds.
 Should I be looking at sharding?

 Thanks,
 Sharath

Re: How to index doc file in solr?

2012-03-08 Thread Erick Erickson

Have you looked at ExtractingRequestHandler (aka Solr Cell)? SolrJ?
Tika?

Perhaps if you defined the problem a bit more we'd be able to
give you more comprehensive answers

Best
Erick

On Wed, Mar 7, 2012 at 12:14 AM, Rohan Ashok Kumbhar
rohan_kumb...@infosys.com wrote:
 Hi,

 I would like to know how to index any  document other than xml in SOLR ?
 Any comments would be appreciated !!!


 Thanks,
 Rohan


  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
 for the use of the addressee(s). If you are not the intended recipient, please
 notify the sender by e-mail and delete the original message. Further, you are 
 not
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage
 you may sustain as a result of any virus in this e-mail. You should carry out 
 your
 own virus checks before opening the e-mail or attachment. Infosys reserves the
 right to monitor and review the content of all messages sent to or from this 
 e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***

Reporting tools

2012-03-08 Thread Donald Organ

Are there any reporting tools out there?  So I can analyzer search term
frequency, filter frequency,  etc?

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-08 Thread Matthew Parker

All,

I recreated the cluster on my machine at home (Windows 7, Java 1.6.0.23,
apache-solr-4.0-2012-02-29_09-07-30) , sent some document through Manifold
using its crawler, and it looks like it's replicating fine once the
documents are committed.

This must be related to my environment somehow. Thanks for your help.

Regards,

Matt

On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson erickerick...@gmail.comwrote:

 Matt:

 Just for paranoia's sake, when I was playing around with this (the
 _version_ thing was one of my problems too) I removed the entire data
 directory as well as the zoo_data directory between experiments (and
 recreated just the data dir). This included various index.2012
 files and the tlog directory on the theory that *maybe* there was some
 confusion happening on startup with an already-wonky index.

 If you have the energy and tried that it might be helpful information,
 but it may also be a total red-herring

 FWIW
 Erick

 On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller markrmil...@gmail.com wrote:
  I assuming the windows configuration looked correct?
 
  Yeah, so far I can not spot any smoking gun...I'm confounded at the
 moment. I'll re read through everything once more...
 
  - Mark


--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Re: Stemmer Question

2012-03-08 Thread Jamie Johnson

I'd be very interested to see how you did this if it is available. Does
this seem like something useful to the community at large?

On Thursday, March 8, 2012, Ahmet Arslan iori...@yahoo.com wrote:
 Thanks the KeywordMarkerFilterFactory
 seems to be what I was looking
 for.  I'm still wondering about keeping the unstemmed
 word as a token
 though.  While I know that this would increase the
 index size slightly
 I wonder what the negative of doing such a thing would
 be?  Just seems
 less destructive since I always store the unstemmed version
 and the
 stemmed version.  By not storing the unstemmed version
 there is no way
 to go back without reindexing. If I wanted to implement this
 I'm
 assuming a custom tokenizer would be most appropriate?
 Does something
 like this already exist?

 Not out-of-the-box. Actually I was using your idea, implemented such
custom token filter by mixing synonym filter and stem filter. This is
useful for wildcard queries. And for normal queries, this could rank exact
matches higher.

Re: indexing bigdata

2012-03-08 Thread Sharath Jagannath

Ok, My bad. I should have put it in a better way.
Is it good idea to have all the 30M docs on a single instance, or should I
consider distributed set-up.
I have synthesized the data and the have configured schema and have made
suitable changes to the config. Have tested out with a smaller data-set on
my laptop and have a good work flow set-up.

I do not have a big machine and test it out.
Wanted to make sure I have insight in either option I have before I decide
to spin-up an amazon instance.

Thanks,
Sharath

On Thu, Mar 8, 2012 at 6:18 PM, Erick Erickson erickerick...@gmail.comwrote:

 Your question is really unanswerable, there are about a zillion
 factors that could influence the answer. I can index 5-7K docs/second
 so it's efficient. Others can index only a fraction of that. It all
 depends...

 Try it and see is about the only way to answer.

 Best
 Erick

 On Thu, Mar 8, 2012 at 1:35 PM, Sharath Jagannath
 shotsonclo...@gmail.com wrote:
  Is indexing around 30 Million documents in a single solr instance
 efficient?
  Has somebody experimented it? Planning to use it for an autosuggest
 feature
  I am implementing, so expecting the response in few milliseconds.
  Should I be looking at sharding?
 
  Thanks,
  Sharath

Re: addBean method inserting multivalued values

2012-03-08 Thread Siddharth Gargate

I have not specified the multivalued attribute.
 dynamicField name=*_i type=integer indexed=true stored=true/

I have different integer properties in my java class, some are single
integer values, some are integer arrays. What I want is if the setter
method is expecting an integer then the field stored must be single valued.
But all integer dynamic fields are being indexed as multivalued.
Please not that this happens only when I use addBeans method. If I
construct a SolrDocument then indexing works as expected.


On Wed, Feb 1, 2012 at 3:43 PM, darul daru...@gmail.com wrote:

 remove multivalue=true in your schema.xml file ?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/addBean-method-inserting-multivalued-values-tp3692511p3706126.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Apache Lucene Eurocon 2012

RE: Custom Sharding on solrcloud

Re: indexing cpu utilization

Re: indexing cpu utilization

Moving from Multiple webapps to Multi Cores -Solr 1.3

Re: How to exactly match fields which are multi-valued?

Re: Question about Streaming Update Solr Server

Re: solr geospatial / spatial4j

Re: How to limit the number of open searchers?

Re: indexing cpu utilization

Re: Stemmer Question

Re: indexing cpu utilization

Understanding update handler statistics

Re: wildcard queries with edismax and lucene query parsers

Re: wildcard queries with edismax and lucene query parsers

Re: Understanding update handler statistics

Re: wildcard queries with edismax and lucene query parsers

Re: Stemmer Question

Importing dynamicField data on the fly

Re: Stemmer Question

Re: solr geospatial / spatial4j

Re: Solr-Lucene compatibility

Re: How to exactly match fields which are multi-valued?

Re: Filter facet_fields with Solr similar to stopwords

Re: Retrieving multiple levels with hierarchical faceting in Solr

Re: maxClauseCount Exception

Re:two solr instances using one index

Re: indexing bigdata

Re: How to index doc file in solr?

Reporting tools

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

Re: Stemmer Question

Re: indexing bigdata

Re: addBean method inserting multivalued values

34 matches

Site Navigation

Mail list logo

Footer information