date:20150921

How can I get a monotonically increasing field value for docs?

2015-09-21 Thread Gili Nachum

I've implemented a custom solr2solr ongoing unidirectional replication
mechanism.

A Replicator (acting as solrJ client), crawls documents from SolrCloud1 and
writes them to SolrCloud2 in batches.
The replicator crawl logic is to read documents with a time greater/equale
to the time of the last replicated document.
Whenever a document is added/updated, I auto updated a a tdate field
"last_updated_in_solr" using TimestampUpdateProcessorFactory.

*My problem: *When a client indexes a batch of 100 documents, all 100 docs
have the same "last_updated_in_solr" value. This makes my ongoing
replication check for new documents to replicate much more complex than if
the time value was unique.

1. Can I use some other processor to generate increasing unique values?
2. Can I use the internal _version_ field for this? is it guaranteed to be
monotonically increasing for the entire collection or only per document,
with each add/update?
Any other options?

Schema.xml:


solrconfig.xml:

   
   last_updated_in_solr
   
   
   


I know there's work for a build-in replication mechanism, but it's not yet
released.
Using Solr 4.7.2.

Re: Does more shards in core improve performance?

2015-09-21 Thread Toke Eskildsen

On Mon, 2015-09-21 at 10:13 +0800, Zheng Lin Edwin Yeo wrote:
> I didn't find any increase in indexing throughput by adding shards in the
> same machine.
> 
> However, I've managed to feed the index to Solr from more than one thread
> at a time. It can take up to 3 threads without affecting the indexing
> speed. Anything more than that, the CPU will hit 100%, and the indexing
> speed in all the threads will be reduced.

It is a bit surprising that the limit is 3 Threads on an 8 core machine,
but I am happy to hear that your findings fit the overall theory.


Thank you for the verification,
Toke Eskildsen, State and University Library, Denmark

Re: ctargett commented on http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html

2015-09-21 Thread Steve Rowe

I logged into comments.a.o and then disabled emailing of comments to this
list.

When we set up the "solrcwiki" site on comments.apache.org, the requirement
was that the PMC chair be the (sole) manager, and though I am no longer
chair, I'm still the manager of the "solrcwiki" site for the ASF commenting
system.

Tomorrow I'll ask ASF Infra about whether the managership should be
transferred to the current PMC chair.  (If they don't care, I don't mind
continuing to manage it.)

On Mon, Sep 21, 2015 at 5:43 PM, Cassandra Targett 
wrote:

> Hey folks,
>
> I'm doing some experiments with other formats for the Ref Guide and playing
> around with options for comments. I didn't realize this old experiment from
> https://issues.apache.org/jira/browse/SOLR-4889 would send email - I'm
> talking to Steve Rowe to see if we can get that disabled.
>
> Cassandra
>
> On Mon, Sep 21, 2015 at 2:06 PM,  wrote:
>
> > Hello,
> > ctargett has commented on
> >
> http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html
> > .
> > You can find the comment here:
> >
> >
> http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html#comment_4535
> > Please note that if the comment contains a hyperlink, it must be
> > approved
> > before it is shown on the site.
> >
> > Below is the reply that was posted:
> > 
> > This is a test of the comments system.
> > 
> >
> > With regards,
> > Apache Solr Cwiki.
> >
> > You are receiving this email because you have subscribed to changes
> > for the solrcwiki site.
> > To stop receiving these emails, unsubscribe from the mailing list
> that
> > is providing these notifications.
> >
> >
>

Re: solr auggestion with copy-field

2015-09-21 Thread Alessandro Benedetti

There are always some steps to take care of in these situations :

1) have you checked that your destination copied field is fine ? Is it
containing what you expect ? have you investigated the indexed terms ?

2) Have you built your suggester ? It doesn't build on startup or onCommit
( reading your config) , so you should build it to see the suggestions

Hard to believe it depends on your copyfield , from the Suggester
perspective , it doesn't know if the field configured is coming from a copy
field or not.

Cheers

2015-09-21 11:34 GMT+01:00 sara hajili :

> hi all
> i wanna to get suggestion from multi field in solr.
> i add this to solrConfig
> 
> 
> mySuggester
> FuzzyLookupFactory
> DocumentDictionaryFactory
> suggestStr
> like_count
> string
> false
> 
> 
> 
> 
> true
> 10
> 
> 
> suggest
> 
> 
>
> and add this to schema:
> 
> and
>  termVectors="true"/>
>termVectors="true"/>
>termVectors="true"/>
>   multiValued="true" termVectors="true" />
>
> but i didn't get any result from this suggest query:
>
> http://localhost:8983/solr/post/suggest?suggest=true=mySuggester=json=solr
>
> but when i used one field (not copy field ) i got answer.
> how i solve my problem with copy field?
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Solr4.7: tlog replay has a major delay before start recovering transaction replay

2015-09-21 Thread Shalin Shekhar Mangar

Hi Jeff,

Comments inline:

On Mon, Sep 21, 2015 at 6:06 PM, Jeff Wu  wrote:
> Our environment ran in Solr4.7. Recently hit a core recovery failure and
> then it retries to recover from tlog.
>
> We noticed after  20:05:22 said Recovery failed, Solr server waited a long
> time before it started tlog replay. During that time, we have about 32
> cores doing such tlog relay. The service took over 40 minutes to make whole
> service back.
>
> Some questions we want to know:
> 1. Is tlog replay a single thread activity? Can we configure to have
> multiple threads since in our deployment we have 64 cores for each solr
> server.

Each core gets a separate recovery thread but each individual log
replay is single-threaded

>
> 2. What might cause the tlog replay thread to wait for over 15 minutes
> before actual tlog replay?  The actual replay seems very quick.

Before tlog replay, the replica will replicate any missing index files
from the leader. I think that is what is causing the time between the
two log messages. You have INFO logging turned off so there are no
messages from the replication handler about it.

>
> 3. The last message "Log replay finished" does not tell which core it is
> finished. Given 32 cores to recover, we can not know which core the log is
> reporting.

Yeah, many such issues were fixed in recent 5.x releases where we use
MDC to log collection, shard, core etc for each message. Furthermore,
tlog replay progress/status is also logged since 5.0

>
> 4. We know 4.7 is pretty old, we'd like to know is this known issue and
> fixed in late release, any related JIRA?
>
> Line 4120: ERROR - 2015-09-16 20:05:22.396;
> org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying again...
> (0) core=collection3_shard11_replica2
> WARN  - 2015-09-16 20:22:50.343;
> org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay
> tlog{file=/mnt/solrdata1/solr/home/collection3_shard11_replica2/data/tlog/tlog.0120498
> refcount=2} active=true starting pos=25981
> WARN  - 2015-09-16 20:22:53.301;
> org.apache.solr.update.UpdateLog$LogReplayer; Log replay finished.
> recoveryInfo=RecoveryInfo{adds=914 deletes=215 deleteByQuery=0 errors=0
> positionOfStart=25981}
>
> Thank you all~



-- 
Regards,
Shalin Shekhar Mangar.

faceting is unusable slow since upgrade to 5.3.0

2015-09-21 Thread Uwe Reh


Hi,

our bibliographic index (~20M entries) runs fine with Solr 4.10.3
With Solr 5.3 faceted searching is constantly incredibly slow (~ 20 
seconds)

Output of 'debugQuery':
17705.0
2.0
17590.0 !!
111.0


The 'fieldValueCache' seems to be unused (no inserts nor lookups) in 
Solr 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a 
cumulative_hitratio of 1.


- the behavior is the same, running Solr5.3 on a copy of the old index 
(luceneMatch=4.6) or a newly build index

- using 'facet.method=enum' makes no remarkable difference
- declaring 'docValues' (with reindexing) makes no remarkable difference
- 'softCommit' isn't used

My enviroment is
  OS: Solaris 5.11 on AMD64
  JDK: 1.8.0_25 and 1.8.0_60 (same behavior)
  JavaOpts: -Xmx 10g -XX:+UseG1GC -XX:+AggressiveOpts 
-XX:+UseLargePages -XX:LargePageSizeInBytes=2m


Any help/advice is welcome
Uwe

Re: solr4.7: leader core does not elected to other active core after sorl OS shutdown, known issue?

2015-09-21 Thread Jeff Wu

Hi Shalin,  thank you for the response.

We waited longer enough than the ZK session timeout time, and it still did
not kick off any leader election for these "remained down-leader" cores.
That's the question I'm actually asking.

Our test scenario:

Each solr server has 64 cores, and they are all active, and all leader
cores.
Shutdown the linux OS.
Monitor clusterstate.json over ZK, after enough ZK session timeout value.
We noticed some cores has leader election happened. But still saw some down
cores remains leader.

2015-09-21 9:15 GMT-04:00 Shalin Shekhar Mangar :

> Hi Jeff,
>
> The leader election relies on ephemeral nodes in Zookeeper to detect
> when leader or other nodes have gone down (abruptly). These ephemeral
> nodes are automatically deleted by ZooKeeper after the ZK session
> timeout which is by default 30 seconds. So if you kill a node then it
> can take up to 30 seconds for the cluster to detect it and start a new
> leader election. This won't be necessary during a graceful shutdown
> because on shutdown the node will give up leader position so that a
> new one can be elected. You could tune the zk session timeout to a
> lower value but then it makes the cluster more sensitive to GC pauses
> which can also trigger new leader elections.
>
> On Mon, Sep 21, 2015 at 5:55 PM, Jeff Wu  wrote:
> > Our environment still run with Solr4.7. Recently we noticed in a test.
> When
> > we stopped 1 solr server(solr02, which did OS shutdown), all the cores of
> > solr02 are shown as "down", but remains a few cores still as leaders.
> After
> > that, we quickly seeing all other servers are still sending requests to
> > that down solr server, and therefore we saw a lot of TCP waiting threads
> in
> > thread pool of other solr servers since solr02 already down.
> >
> > "shard53":{
> > "range":"2666-2998",
> > "state":"active",
> > "replicas":{
> >   "core_node102":{
> > "state":"down",
> > "base_url":"https://solr02.myhost/solr;,
> > "core":"collection2_shard53_replica1",
> > "node_name":"https://solr02.myhost_solr;,
> > "leader":"true"},
> >   "core_node104":{
> > "state":"active",
> > "base_url":"https://solr04.myhost/solr;,
> > "core":"collection2_shard53_replica2",
> > "node_name":"https://solr04.myhost/solr_solr"}}},
> >
> > Is this something known bug in 4.7 and late on fixed? Any reference JIRA
> we
> > can study about?  If the solr service is stopped gracefully, we can see
> > leader core election happens and switched to other active core. But if we
> > just directly shutdown a Solr OS, we can reproduce in our environment
> that
> > some "Down" cores remains "leader" at ZK clusterstate.json
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: solr update dynamic field generates multiValued error

2015-09-21 Thread Aman Tandon

Sure. thank you Upayavira

With Regards
Aman Tandon

On Mon, Sep 21, 2015 at 6:01 PM, Upayavira  wrote:

> You cannot do multi valued fields with LatLongType fields. Therefore, if
> that is a need, you will have to investigate RPT fields.
>
> I'm not sure how you do distance boosting there, so I'd suggest you ask
> that as a separate question with a new title.
>
> Upayavira
>
> On Mon, Sep 21, 2015, at 01:27 PM, Aman Tandon wrote:
> > We are using LatLonType to use the gradual boosting / distance based
> > boosting of search results.
> >
> > With Regards
> > Aman Tandon
> >
> > On Mon, Sep 21, 2015 at 5:39 PM, Upayavira  wrote:
> >
> > > Aman,
> > >
> > > I cannot promise to answer questions promptly - like most people on
> this
> > > list, we answer if/when we have a gap in our workload.
> > >
> > > The reason you are getting the non multiValued field error is because
> > > your latlon field does not have multiValued="true" enabled.
> > >
> > > However, the field type definition notes that this field type does not
> > > support multivalued fields, so you're not gonna get anywhere with that
> > > route.
> > >
> > > Have you tried the location_rpt type?
> > > (solr.SpatialRecursivePrefixTreeFieldType). This is a newer, and as I
> > > understand it, far more flexible field type - for example, you can
> index
> > > shapes into it as well as locations.
> > >
> > > I'd suggest you read this page, and pay particular attention to
> mentions
> > > of RPT:
> > >
> > > https://cwiki.apache.org/confluence/display/solr/Spatial+Search
> > >
> > > Upayavira
> > >
> > > On Mon, Sep 21, 2015, at 10:36 AM, Aman Tandon wrote:
> > > > Upayavira, please help
> > > >
> > > > With Regards
> > > > Aman Tandon
> > > >
> > > > On Mon, Sep 21, 2015 at 2:38 PM, Aman Tandon <
> amantandon...@gmail.com>
> > > > wrote:
> > > >
> > > > > Error is
> > > > >
> > > > > 
> > > > > 
> > > > > 400 > > > > name="QTime">28ERROR:
> > > > > [doc=9474144846] multiple values encountered for non multiValued
> field
> > > > > latlon_0_coordinate: [11.0183, 11.0183] > > > > name="code">400
> > > > > 
> > > > >
> > > > > And my configuration is
> > > > >
> > > > > 
> > > > > > > > >  stored="true" />
> > > > >
> > > > >  
> > > > >  > > > > subFieldSuffix="_coordinate"/>
> > > > >
> > > > >> > > > required="false" multiValued="false" />
> > > > >
> > > > >  how you know it is because of stored="true"?
> > > > >
> > > > > As Erick replied in the last mail thread,
> > > > > I'm not getting any multiple values in the _coordinate fields.
> > > However, I
> > > > > _do_ get the error if my dynamic *_coordinate field is set to
> > > > > stored="true".
> > > > >
> > > > > And stored="true" is mandatory for using the atomic updates.
> > > > >
> > > > > With Regards
> > > > > Aman Tandon
> > > > >
> > > > > On Mon, Sep 21, 2015 at 2:22 PM, Upayavira  wrote:
> > > > >
> > > > >> Can you show the error you are getting, and how you know it is
> because
> > > > >> of stored="true"?
> > > > >>
> > > > >> Upayavira
> > > > >>
> > > > >> On Mon, Sep 21, 2015, at 09:30 AM, Aman Tandon wrote:
> > > > >> > Hi Erick,
> > > > >> >
> > > > >> > I am getting the same error because my dynamic field
> *_coordinate is
> > > > >> > stored="true".
> > > > >> > How can I get rid of this error?
> > > > >> >
> > > > >> > And I have to use the atomic update. Please help!!
> > > > >> >
> > > > >> > With Regards
> > > > >> > Aman Tandon
> > > > >> >
> > > > >> > On Tue, Aug 5, 2014 at 10:27 PM, Franco Giacosa <
> fgiac...@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hey Erick, i think that you were right, there was a mix in the
> > > > >> schemas and
> > > > >> > > that was generating the error on some of the documents.
> > > > >> > >
> > > > >> > > Thanks for the help guys!
> > > > >> > >
> > > > >> > >
> > > > >> > > 2014-08-05 1:28 GMT-03:00 Erick Erickson <
> erickerick...@gmail.com
> > > >:
> > > > >> > >
> > > > >> > > > Hmmm, I jus tried this with a 4.x build and I can update the
> > > > >> document
> > > > >> > > > multiple times without a problem. I just indexed the
> standard
> > > > >> exampledocs
> > > > >> > > > and then updated a doc like this (vidcard.xml was the base):
> > > > >> > > >
> > > > >> > > > 
> > > > >> > > > 
> > > > >> > > >   EN7800GTX/2DHTV/256M
> > > > >> > > >
> > > > >> > > >   eoe changed this
> > > > >> puppy
> > > > >> > > > 
> > > > >> > > >   
> > > > >> > > > 
> > > > >> > > >
> > > > >> > > > I'm not getting any multiple values in the _coordinate
> fields.
> > > > >> However, I
> > > > >> > > > _do_ get the error if my dynamic *_coordinate field is set
> to
> > > > >> > > > stored="true".
> > > > >> > > >
> > > > >> > > > Did you perhaps change this at some point? Whenever I
> change the
> > > > >> schema,
> > > > >> > > I
> > > > >> > > > try to 'rm -rf solr/collection/data' just to be sure I've
> > > purged all
> > > > >> > > traces
> > > > >> > > > of the

solr4.7: leader core does not elected to other active core after sorl OS shutdown, known issue?

2015-09-21 Thread Jeff Wu

Our environment still run with Solr4.7. Recently we noticed in a test. When
we stopped 1 solr server(solr02, which did OS shutdown), all the cores of
solr02 are shown as "down", but remains a few cores still as leaders. After
that, we quickly seeing all other servers are still sending requests to
that down solr server, and therefore we saw a lot of TCP waiting threads in
thread pool of other solr servers since solr02 already down.

"shard53":{
"range":"2666-2998",
"state":"active",
"replicas":{
  "core_node102":{
"state":"down",
"base_url":"https://solr02.myhost/solr;,
"core":"collection2_shard53_replica1",
"node_name":"https://solr02.myhost_solr;,
"leader":"true"},
  "core_node104":{
"state":"active",
"base_url":"https://solr04.myhost/solr;,
"core":"collection2_shard53_replica2",
"node_name":"https://solr04.myhost/solr_solr"}}},

Is this something known bug in 4.7 and late on fixed? Any reference JIRA we
can study about?  If the solr service is stopped gracefully, we can see
leader core election happens and switched to other active core. But if we
just directly shutdown a Solr OS, we can reproduce in our environment that
some "Down" cores remains "leader" at ZK clusterstate.json

Re: Solr4.7: tlog replay has a major delay before start recovering transaction replay

2015-09-21 Thread Jeff Wu

>
> Before tlog replay, the replica will replicate any missing index files
> from the leader. I think that is what is causing the time between the
> two log messages. You have INFO logging turned off so there are no
> messages from the replication handler about it.


I did not monitor major network throughput during that timeframe, and I
thought the first log already showed the peersync failed. So I try to
understand the time spent there.

Also, in our solr.log, I did not see log reporting Recovery- retry(1),
Recovery - retry(2), Recovery give up, etc in this log file before it tells
us "tlog replay"

2015-09-21 9:07 GMT-04:00 Shalin Shekhar Mangar :

> Hi Jeff,
>
> Comments inline:
>
> On Mon, Sep 21, 2015 at 6:06 PM, Jeff Wu  wrote:
> > Our environment ran in Solr4.7. Recently hit a core recovery failure and
> > then it retries to recover from tlog.
> >
> > We noticed after  20:05:22 said Recovery failed, Solr server waited a
> long
> > time before it started tlog replay. During that time, we have about 32
> > cores doing such tlog relay. The service took over 40 minutes to make
> whole
> > service back.
> >
> > Some questions we want to know:
> > 1. Is tlog replay a single thread activity? Can we configure to have
> > multiple threads since in our deployment we have 64 cores for each solr
> > server.
>
> Each core gets a separate recovery thread but each individual log
> replay is single-threaded
>
> >
> > 2. What might cause the tlog replay thread to wait for over 15 minutes
> > before actual tlog replay?  The actual replay seems very quick.
>
> Before tlog replay, the replica will replicate any missing index files
> from the leader. I think that is what is causing the time between the
> two log messages. You have INFO logging turned off so there are no
> messages from the replication handler about it.
>
> >
> > 3. The last message "Log replay finished" does not tell which core it is
> > finished. Given 32 cores to recover, we can not know which core the log
> is
> > reporting.
>
> Yeah, many such issues were fixed in recent 5.x releases where we use
> MDC to log collection, shard, core etc for each message. Furthermore,
> tlog replay progress/status is also logged since 5.0
>
> >
> > 4. We know 4.7 is pretty old, we'd like to know is this known issue and
> > fixed in late release, any related JIRA?
> >
> > Line 4120: ERROR - 2015-09-16 20:05:22.396;
> > org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying again...
> > (0) core=collection3_shard11_replica2
> > WARN  - 2015-09-16 20:22:50.343;
> > org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay
> >
> tlog{file=/mnt/solrdata1/solr/home/collection3_shard11_replica2/data/tlog/tlog.0120498
> > refcount=2} active=true starting pos=25981
> > WARN  - 2015-09-16 20:22:53.301;
> > org.apache.solr.update.UpdateLog$LogReplayer; Log replay finished.
> > recoveryInfo=RecoveryInfo{adds=914 deletes=215 deleteByQuery=0 errors=0
> > positionOfStart=25981}
> >
> > Thank you all~
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Jeff Wu
---
CSDL Beijing, China

solr auggestion with copy-field

2015-09-21 Thread sara hajili

hi all
i wanna to get suggestion from multi field in solr.
i add this to solrConfig


mySuggester
FuzzyLookupFactory
DocumentDictionaryFactory
suggestStr
like_count
string
false




true
10


suggest



and add this to schema:

and

  
  
 

but i didn't get any result from this suggest query:
http://localhost:8983/solr/post/suggest?suggest=true=mySuggester=json=solr

but when i used one field (not copy field ) i got answer.
how i solve my problem with copy field?

Re: solr update dynamic field generates multiValued error

2015-09-21 Thread Upayavira

You cannot do multi valued fields with LatLongType fields. Therefore, if
that is a need, you will have to investigate RPT fields.

I'm not sure how you do distance boosting there, so I'd suggest you ask
that as a separate question with a new title.

Upayavira

On Mon, Sep 21, 2015, at 01:27 PM, Aman Tandon wrote:
> We are using LatLonType to use the gradual boosting / distance based
> boosting of search results.
> 
> With Regards
> Aman Tandon
> 
> On Mon, Sep 21, 2015 at 5:39 PM, Upayavira  wrote:
> 
> > Aman,
> >
> > I cannot promise to answer questions promptly - like most people on this
> > list, we answer if/when we have a gap in our workload.
> >
> > The reason you are getting the non multiValued field error is because
> > your latlon field does not have multiValued="true" enabled.
> >
> > However, the field type definition notes that this field type does not
> > support multivalued fields, so you're not gonna get anywhere with that
> > route.
> >
> > Have you tried the location_rpt type?
> > (solr.SpatialRecursivePrefixTreeFieldType). This is a newer, and as I
> > understand it, far more flexible field type - for example, you can index
> > shapes into it as well as locations.
> >
> > I'd suggest you read this page, and pay particular attention to mentions
> > of RPT:
> >
> > https://cwiki.apache.org/confluence/display/solr/Spatial+Search
> >
> > Upayavira
> >
> > On Mon, Sep 21, 2015, at 10:36 AM, Aman Tandon wrote:
> > > Upayavira, please help
> > >
> > > With Regards
> > > Aman Tandon
> > >
> > > On Mon, Sep 21, 2015 at 2:38 PM, Aman Tandon 
> > > wrote:
> > >
> > > > Error is
> > > >
> > > > 
> > > > 
> > > > 400 > > > name="QTime">28ERROR:
> > > > [doc=9474144846] multiple values encountered for non multiValued field
> > > > latlon_0_coordinate: [11.0183, 11.0183] > > > name="code">400
> > > > 
> > > >
> > > > And my configuration is
> > > >
> > > > 
> > > > > > >  stored="true" />
> > > >
> > > >  
> > > >  > > > subFieldSuffix="_coordinate"/>
> > > >
> > > >> > > required="false" multiValued="false" />
> > > >
> > > >  how you know it is because of stored="true"?
> > > >
> > > > As Erick replied in the last mail thread,
> > > > I'm not getting any multiple values in the _coordinate fields.
> > However, I
> > > > _do_ get the error if my dynamic *_coordinate field is set to
> > > > stored="true".
> > > >
> > > > And stored="true" is mandatory for using the atomic updates.
> > > >
> > > > With Regards
> > > > Aman Tandon
> > > >
> > > > On Mon, Sep 21, 2015 at 2:22 PM, Upayavira  wrote:
> > > >
> > > >> Can you show the error you are getting, and how you know it is because
> > > >> of stored="true"?
> > > >>
> > > >> Upayavira
> > > >>
> > > >> On Mon, Sep 21, 2015, at 09:30 AM, Aman Tandon wrote:
> > > >> > Hi Erick,
> > > >> >
> > > >> > I am getting the same error because my dynamic field *_coordinate is
> > > >> > stored="true".
> > > >> > How can I get rid of this error?
> > > >> >
> > > >> > And I have to use the atomic update. Please help!!
> > > >> >
> > > >> > With Regards
> > > >> > Aman Tandon
> > > >> >
> > > >> > On Tue, Aug 5, 2014 at 10:27 PM, Franco Giacosa  > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hey Erick, i think that you were right, there was a mix in the
> > > >> schemas and
> > > >> > > that was generating the error on some of the documents.
> > > >> > >
> > > >> > > Thanks for the help guys!
> > > >> > >
> > > >> > >
> > > >> > > 2014-08-05 1:28 GMT-03:00 Erick Erickson  > >:
> > > >> > >
> > > >> > > > Hmmm, I jus tried this with a 4.x build and I can update the
> > > >> document
> > > >> > > > multiple times without a problem. I just indexed the standard
> > > >> exampledocs
> > > >> > > > and then updated a doc like this (vidcard.xml was the base):
> > > >> > > >
> > > >> > > > 
> > > >> > > > 
> > > >> > > >   EN7800GTX/2DHTV/256M
> > > >> > > >
> > > >> > > >   eoe changed this
> > > >> puppy
> > > >> > > > 
> > > >> > > >   
> > > >> > > > 
> > > >> > > >
> > > >> > > > I'm not getting any multiple values in the _coordinate fields.
> > > >> However, I
> > > >> > > > _do_ get the error if my dynamic *_coordinate field is set to
> > > >> > > > stored="true".
> > > >> > > >
> > > >> > > > Did you perhaps change this at some point? Whenever I change the
> > > >> schema,
> > > >> > > I
> > > >> > > > try to 'rm -rf solr/collection/data' just to be sure I've
> > purged all
> > > >> > > traces
> > > >> > > > of the former schema definition.
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Erick
> > > >> > > >
> > > >> > > >
> > > >> > > > On Mon, Aug 4, 2014 at 7:04 PM, Franco Giacosa <
> > fgiac...@gmail.com>
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > > No, they are not declarad explicitly.
> > > >> > > > >
> > > >> > > > > This is how they are created:
> > > >> > > > >
> > > >> > > > >  > > >> stored="true"/>
> > > >> > > > >
> > > >> > >

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-21 Thread Uwe Reh


Am 21.09.2015 um 15:16 schrieb Shalin Shekhar Mangar:

Can you post your complete facet request as well as the schema
definition of the field on which you are faceting?



Query:

http://yxz/solr/hebis/select/?q=darwin=true=1=30=material_access=department_3=rvk_facet=author_facet=material_brief=language==count=all=true




Schema (with docValue):

...


...

...




Schema (w/o docValue):

...


...

...




solrconfig:

...

...

  
 10
 allfields
 none
  
  
 query
 facet
 stats
 debug
 elevator

Re: modular QueryParser in contrib

2015-09-21 Thread Jack Krupansky

Probably a reference to the so-called flex query parser:
https://lucene.apache.org/core/4_10_0/queryparser/org/apache/lucene/queryparser/flexible/standard/StandardQueryParser.html

Read:
https://lucene.apache.org/core/4_10_0/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html

The original Jira:
https://issues.apache.org/jira/browse/LUCENE-1567

This new query parser was dumped into Lucene some years ago, but I haven't
noticed any real activity or interest in it.

-- Jack Krupansky

On Mon, Sep 21, 2015 at 6:36 AM, Dmitry Kan  wrote:

> Hello!
>
> Asked the question on IRC, mirroring it here too: In lucene level QP there
> is a comment
>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_10/lucene/queryparser/src/java/org/apache/lucene/queryparser/classic/QueryParser.jj#L99
> pointing to some contrib query parser, that offers modularity and
> customizability.
>
> Can you point to what the exact class is?
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>

Re: solr update dynamic field generates multiValued error

2015-09-21 Thread Aman Tandon

We are using LatLonType to use the gradual boosting / distance based
boosting of search results.

With Regards
Aman Tandon

On Mon, Sep 21, 2015 at 5:39 PM, Upayavira  wrote:

> Aman,
>
> I cannot promise to answer questions promptly - like most people on this
> list, we answer if/when we have a gap in our workload.
>
> The reason you are getting the non multiValued field error is because
> your latlon field does not have multiValued="true" enabled.
>
> However, the field type definition notes that this field type does not
> support multivalued fields, so you're not gonna get anywhere with that
> route.
>
> Have you tried the location_rpt type?
> (solr.SpatialRecursivePrefixTreeFieldType). This is a newer, and as I
> understand it, far more flexible field type - for example, you can index
> shapes into it as well as locations.
>
> I'd suggest you read this page, and pay particular attention to mentions
> of RPT:
>
> https://cwiki.apache.org/confluence/display/solr/Spatial+Search
>
> Upayavira
>
> On Mon, Sep 21, 2015, at 10:36 AM, Aman Tandon wrote:
> > Upayavira, please help
> >
> > With Regards
> > Aman Tandon
> >
> > On Mon, Sep 21, 2015 at 2:38 PM, Aman Tandon 
> > wrote:
> >
> > > Error is
> > >
> > > 
> > > 
> > > 400 > > name="QTime">28ERROR:
> > > [doc=9474144846] multiple values encountered for non multiValued field
> > > latlon_0_coordinate: [11.0183, 11.0183] > > name="code">400
> > > 
> > >
> > > And my configuration is
> > >
> > > 
> > > > >  stored="true" />
> > >
> > >  
> > >  > > subFieldSuffix="_coordinate"/>
> > >
> > >> > required="false" multiValued="false" />
> > >
> > >  how you know it is because of stored="true"?
> > >
> > > As Erick replied in the last mail thread,
> > > I'm not getting any multiple values in the _coordinate fields.
> However, I
> > > _do_ get the error if my dynamic *_coordinate field is set to
> > > stored="true".
> > >
> > > And stored="true" is mandatory for using the atomic updates.
> > >
> > > With Regards
> > > Aman Tandon
> > >
> > > On Mon, Sep 21, 2015 at 2:22 PM, Upayavira  wrote:
> > >
> > >> Can you show the error you are getting, and how you know it is because
> > >> of stored="true"?
> > >>
> > >> Upayavira
> > >>
> > >> On Mon, Sep 21, 2015, at 09:30 AM, Aman Tandon wrote:
> > >> > Hi Erick,
> > >> >
> > >> > I am getting the same error because my dynamic field *_coordinate is
> > >> > stored="true".
> > >> > How can I get rid of this error?
> > >> >
> > >> > And I have to use the atomic update. Please help!!
> > >> >
> > >> > With Regards
> > >> > Aman Tandon
> > >> >
> > >> > On Tue, Aug 5, 2014 at 10:27 PM, Franco Giacosa  >
> > >> > wrote:
> > >> >
> > >> > > Hey Erick, i think that you were right, there was a mix in the
> > >> schemas and
> > >> > > that was generating the error on some of the documents.
> > >> > >
> > >> > > Thanks for the help guys!
> > >> > >
> > >> > >
> > >> > > 2014-08-05 1:28 GMT-03:00 Erick Erickson  >:
> > >> > >
> > >> > > > Hmmm, I jus tried this with a 4.x build and I can update the
> > >> document
> > >> > > > multiple times without a problem. I just indexed the standard
> > >> exampledocs
> > >> > > > and then updated a doc like this (vidcard.xml was the base):
> > >> > > >
> > >> > > > 
> > >> > > > 
> > >> > > >   EN7800GTX/2DHTV/256M
> > >> > > >
> > >> > > >   eoe changed this
> > >> puppy
> > >> > > > 
> > >> > > >   
> > >> > > > 
> > >> > > >
> > >> > > > I'm not getting any multiple values in the _coordinate fields.
> > >> However, I
> > >> > > > _do_ get the error if my dynamic *_coordinate field is set to
> > >> > > > stored="true".
> > >> > > >
> > >> > > > Did you perhaps change this at some point? Whenever I change the
> > >> schema,
> > >> > > I
> > >> > > > try to 'rm -rf solr/collection/data' just to be sure I've
> purged all
> > >> > > traces
> > >> > > > of the former schema definition.
> > >> > > >
> > >> > > > Best,
> > >> > > > Erick
> > >> > > >
> > >> > > >
> > >> > > > On Mon, Aug 4, 2014 at 7:04 PM, Franco Giacosa <
> fgiac...@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > > No, they are not declarad explicitly.
> > >> > > > >
> > >> > > > > This is how they are created:
> > >> > > > >
> > >> > > > >  > >> stored="true"/>
> > >> > > > >
> > >> > > > >  indexed="true"
> > >> > > > >  stored="false"/>
> > >> > > > >
> > >> > > > >  > >> > > > > subFieldSuffix="_coordinate"/>
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > 2014-08-04 22:28 GMT-03:00 Michael Ryan :
> > >> > > > >
> > >> > > > > > Are the latLong_0_coordinate and latLong_1_coordinate fields
> > >> > > populated
> > >> > > > > > using copyField? If so, this sounds like it could be
> > >> > > > > > https://issues.apache.org/jira/browse/SOLR-3502.
> > >> > > > > >
> > >> > > > > > -Michael
> > >> > > > > >
> > >> > > > > > -Original Message-
> > >>

Re: Zero Query results

2015-09-21 Thread Mark Fenbers

Ok, Erick, you provided useful info to help with my understanding. 
However, I still get zero results when I search on literal text (e.g., 
"Wednesday"), even with making changes that you suggest. However, I 
discovered that if I search on "Wednesday*" (trailing asterisk), then I 
get all the results containing Wednesday that I'm looking for!  Why 
would adding a wildcard token change the results I get back?


In my schema.xml, my customized section now looks like this, based on 
your previous message:



required="true" />
required="true" />
required="true" />


multiValued="true" />




Then I removed the data subdir, did a solr restart, and did a 
/dataimport again.  It successfully processed all 9857 documents. No 
stack traces in solr.log.  It is at this point that searching on 
Wednesday gave zero results (Boo!), but searching on Wednesday* gave 
hundreds of results. (Yay!)  My changes to schema.xml were to make 
logtext be the type "text_en".   Previously, the only line in schema.xml 
was the first one ("id"), and I changed that from type="text" to 
type="date" because it is a Timestamp object in Java and a "timestamp 
without time zone" in PostgreSQL.  But even with these changes, the 
results are the same as before.


Do you have any more ideas why searching on any literal string finds 
zero documents?


Thanks,
Mark


On 9/18/2015 10:30 PM, Erick Erickson wrote:

bq: There is no fieldType defined in my solrconfig.xml, unless you are
referring to this line:

Well, that's because you should be looking in schema.xml ;).

This line from your stacktrace file is very suspicious:
   logtext:Wednesday

It _looks_ like your logtext file is perhaps a "string" type. String
types are totally unanalyzed,
so unless the input matches _exactly_ (and by exactly mean same case,
same words, same
order, identical punctuation) you won't find the doc. Thus with a
string field type, if the doc had
"my Dog has fleas.", searching for "my" or "My" or "My dog has fleas"
or "my Dog has fleas"
would all not find the doc (this last one has no period".

You usually want one of the text types, text_en or the like. Note that
you will be a _long_ time
figuring out how all that works and affects your searches, the
admin/analysis page is definitely
your friend.

There should be a line similar to


Somewhere else there should be something like:


The fieldType is what determines how the text is handled to search,
how it's broken up
and, in essence, how searches behave.

So what Erik and Shawn were asking is those two definitions.

Do note if you've changed the definitions here, it's usually wise to
'rm -rf /data' and completely re-index from scratch.

Best,
Erick

Solr4.7: tlog replay has a major delay before start recovering transaction replay

2015-09-21 Thread Jeff Wu

Our environment ran in Solr4.7. Recently hit a core recovery failure and
then it retries to recover from tlog.

We noticed after  20:05:22 said Recovery failed, Solr server waited a long
time before it started tlog replay. During that time, we have about 32
cores doing such tlog relay. The service took over 40 minutes to make whole
service back.

Some questions we want to know:
1. Is tlog replay a single thread activity? Can we configure to have
multiple threads since in our deployment we have 64 cores for each solr
server.

2. What might cause the tlog replay thread to wait for over 15 minutes
before actual tlog replay?  The actual replay seems very quick.

3. The last message "Log replay finished" does not tell which core it is
finished. Given 32 cores to recover, we can not know which core the log is
reporting.

4. We know 4.7 is pretty old, we'd like to know is this known issue and
fixed in late release, any related JIRA?

Line 4120: ERROR - 2015-09-16 20:05:22.396;
org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying again...
(0) core=collection3_shard11_replica2
WARN  - 2015-09-16 20:22:50.343;
org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay
tlog{file=/mnt/solrdata1/solr/home/collection3_shard11_replica2/data/tlog/tlog.0120498
refcount=2} active=true starting pos=25981
WARN  - 2015-09-16 20:22:53.301;
org.apache.solr.update.UpdateLog$LogReplayer; Log replay finished.
recoveryInfo=RecoveryInfo{adds=914 deletes=215 deleteByQuery=0 errors=0
positionOfStart=25981}

Thank you all~

Re: solr4.7: leader core does not elected to other active core after sorl OS shutdown, known issue?

2015-09-21 Thread Shalin Shekhar Mangar

Hi Jeff,

The leader election relies on ephemeral nodes in Zookeeper to detect
when leader or other nodes have gone down (abruptly). These ephemeral
nodes are automatically deleted by ZooKeeper after the ZK session
timeout which is by default 30 seconds. So if you kill a node then it
can take up to 30 seconds for the cluster to detect it and start a new
leader election. This won't be necessary during a graceful shutdown
because on shutdown the node will give up leader position so that a
new one can be elected. You could tune the zk session timeout to a
lower value but then it makes the cluster more sensitive to GC pauses
which can also trigger new leader elections.

On Mon, Sep 21, 2015 at 5:55 PM, Jeff Wu  wrote:
> Our environment still run with Solr4.7. Recently we noticed in a test. When
> we stopped 1 solr server(solr02, which did OS shutdown), all the cores of
> solr02 are shown as "down", but remains a few cores still as leaders. After
> that, we quickly seeing all other servers are still sending requests to
> that down solr server, and therefore we saw a lot of TCP waiting threads in
> thread pool of other solr servers since solr02 already down.
>
> "shard53":{
> "range":"2666-2998",
> "state":"active",
> "replicas":{
>   "core_node102":{
> "state":"down",
> "base_url":"https://solr02.myhost/solr;,
> "core":"collection2_shard53_replica1",
> "node_name":"https://solr02.myhost_solr;,
> "leader":"true"},
>   "core_node104":{
> "state":"active",
> "base_url":"https://solr04.myhost/solr;,
> "core":"collection2_shard53_replica2",
> "node_name":"https://solr04.myhost/solr_solr"}}},
>
> Is this something known bug in 4.7 and late on fixed? Any reference JIRA we
> can study about?  If the solr service is stopped gracefully, we can see
> leader core election happens and switched to other active core. But if we
> just directly shutdown a Solr OS, we can reproduce in our environment that
> some "Down" cores remains "leader" at ZK clusterstate.json

-- 
Regards,
Shalin Shekhar Mangar.

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-21 Thread Shalin Shekhar Mangar

Can you post your complete facet request as well as the schema
definition of the field on which you are faceting?

On Mon, Sep 21, 2015 at 5:39 PM, Uwe Reh  wrote:
> Hi,
>
> our bibliographic index (~20M entries) runs fine with Solr 4.10.3
> With Solr 5.3 faceted searching is constantly incredibly slow (~ 20 seconds)
>>
>> Output of 'debugQuery':
>> 17705.0
>> 2.0
>> 17590.0 !!
>> 111.0
>
>
> The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
> 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
> cumulative_hitratio of 1.
>
> - the behavior is the same, running Solr5.3 on a copy of the old index
> (luceneMatch=4.6) or a newly build index
> - using 'facet.method=enum' makes no remarkable difference
> - declaring 'docValues' (with reindexing) makes no remarkable difference
> - 'softCommit' isn't used
>
> My enviroment is
>   OS: Solaris 5.11 on AMD64
>   JDK: 1.8.0_25 and 1.8.0_60 (same behavior)
>   JavaOpts: -Xmx 10g -XX:+UseG1GC -XX:+AggressiveOpts -XX:+UseLargePages
> -XX:LargePageSizeInBytes=2m
>
> Any help/advice is welcome
> Uwe



-- 
Regards,
Shalin Shekhar Mangar.

modular QueryParser in contrib

2015-09-21 Thread Dmitry Kan

Hello!

Asked the question on IRC, mirroring it here too: In lucene level QP there
is a comment
https://github.com/apache/lucene-solr/blob/lucene_solr_4_10/lucene/queryparser/src/java/org/apache/lucene/queryparser/classic/QueryParser.jj#L99
pointing to some contrib query parser, that offers modularity and
customizability.

Can you point to what the exact class is?

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Spatial Search: distance based boosting

2015-09-21 Thread Aman Tandon

Hi,

Is there a way in solr to do the distance based boosting using Spatial RPT
field?

With Regards
Aman Tandon

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-21 Thread Joel Bernstein

Have you looked at your Solr instance with a cpu profiler like YourKit? It
would be useful to see the hotspots which should be really obvious with 20
second response times.

Also are you running in distributed mode or on a single Solr instance?

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Sep 21, 2015 at 9:42 AM, Uwe Reh  wrote:

> Am 21.09.2015 um 15:16 schrieb Shalin Shekhar Mangar:
>
>> Can you post your complete facet request as well as the schema
>> definition of the field on which you are faceting?
>>
>>
> Query:
>
>>
>> http://yxz/solr/hebis/select/?q=darwin=true=1=30=material_access=department_3=rvk_facet=author_facet=material_brief=language==count=all=true
>>
>
>
>
> Schema (with docValue):
>
>> ...
>> > required="false" multiValued="true" docValues="true" />
>> > required="false" multiValued="true" docValues="true" />
>> ...
>> 
>> ...
>>
>
>
>
> Schema (w/o docValue):
>
>> ...
>> > required="false" multiValued="true" docValues="true" />
>> > required="false" multiValued="true" />
>> ...
>> 
>> ...
>>
>
>
>
> solrconfig:
>
>> ...
>> > showItems="48" />
>> ...
>> 
>>   
>>  10
>>  allfields
>>  none
>>   
>>   
>>  query
>>  facet
>>  stats
>>  debug
>>  elevator
>>   
>>
>>
>
>
>

Re: SolrCloud Startup question

2015-09-21 Thread Ravi Solr

Thank you Anshum & Upayavira.

BTW do any of you guys know if CloudSolrClient is ThreadSafe ??

Thanks,

Ravi Kiran Bhaskar

On Monday, September 21, 2015, Anshum Gupta  wrote:

> Hi Ravi,
>
> I just tried it out and here's my understanding:
>
> 1. Starting Solr with -c starts Solr in cloud mode. This is used to start
> Solr with an embedded zookeeper.
> 2. Starting Solr with -z starts Solr in cloud mode, with the zk connection
> string you specify. You don't need to explicitly specify -c in this case.
> The help text there needs a bit of fixing though
>
> *  -zZooKeeper connection string; only used when running in
> SolrCloud mode using -c*
> *   To launch an embedded ZooKeeper instance, don't pass
> this parameter.*
>
> *"only used when running in SolrCloud mode using -c" *needs to be rephrased
> or removed. Can you create a JIRA for the same?
>
>
> On Mon, Sep 21, 2015 at 1:35 PM, Ravi Solr  > wrote:
>
> > Can somebody kindly help me understand the difference between the
> following
> > startup calls ?
> >
> > ./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> >
> > Vs
> >
> > ./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> >
> > What happens if i don't pass the "-c" option ?? I read the documentation
> > but got more confused, I do run a ZK ensemble of 3 instances.  FYI my
> cloud
> > seems to work fine and teh Admin UI shows Cloud graph just fine, but I
> want
> > to just make sure I am doing the right thing and not missing any nuance.
> >
> > The following is form documention on cwiki.
> > ---
> >
> > "Start Solr in SolrCloud mode, which will also launch the embedded
> > ZooKeeper instance included with Solr.
> >
> > This option can be shortened to simply -c.
> >
> > If you are already running a ZooKeeper ensemble that you want to use
> > instead of the embedded (single-node) ZooKeeper, you should also pass the
> > -z parameter."
> >
> > -
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
>
>
>
> --
> Anshum Gupta
>

Re: SolrCloud Startup question

2015-09-21 Thread Anshum Gupta

CloudSolrClient is thread safe and it is highly recommended you reuse the
client.

If you are providing an HttpClient instance while constructing, make sure
that the HttpClient uses a multi-threaded connection manager.

On Mon, Sep 21, 2015 at 3:13 PM, Ravi Solr  wrote:

> Thank you Anshum & Upayavira.
>
> BTW do any of you guys know if CloudSolrClient is ThreadSafe ??
>
> Thanks,
>
> Ravi Kiran Bhaskar
>
> On Monday, September 21, 2015, Anshum Gupta 
> wrote:
>
> > Hi Ravi,
> >
> > I just tried it out and here's my understanding:
> >
> > 1. Starting Solr with -c starts Solr in cloud mode. This is used to start
> > Solr with an embedded zookeeper.
> > 2. Starting Solr with -z starts Solr in cloud mode, with the zk
> connection
> > string you specify. You don't need to explicitly specify -c in this case.
> > The help text there needs a bit of fixing though
> >
> > *  -zZooKeeper connection string; only used when running in
> > SolrCloud mode using -c*
> > *   To launch an embedded ZooKeeper instance, don't pass
> > this parameter.*
> >
> > *"only used when running in SolrCloud mode using -c" *needs to be
> rephrased
> > or removed. Can you create a JIRA for the same?
> >
> >
> > On Mon, Sep 21, 2015 at 1:35 PM, Ravi Solr  > > wrote:
> >
> > > Can somebody kindly help me understand the difference between the
> > following
> > > startup calls ?
> > >
> > > ./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> > >
> > > Vs
> > >
> > > ./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> > >
> > > What happens if i don't pass the "-c" option ?? I read the
> documentation
> > > but got more confused, I do run a ZK ensemble of 3 instances.  FYI my
> > cloud
> > > seems to work fine and teh Admin UI shows Cloud graph just fine, but I
> > want
> > > to just make sure I am doing the right thing and not missing any
> nuance.
> > >
> > > The following is form documention on cwiki.
> > > ---
> > >
> > > "Start Solr in SolrCloud mode, which will also launch the embedded
> > > ZooKeeper instance included with Solr.
> > >
> > > This option can be shortened to simply -c.
> > >
> > > If you are already running a ZooKeeper ensemble that you want to use
> > > instead of the embedded (single-node) ZooKeeper, you should also pass
> the
> > > -z parameter."
> > >
> > > -
> > >
> > > Thanks
> > >
> > > Ravi Kiran Bhaskar
> > >
> >
> >
> >
> > --
> > Anshum Gupta
> >
>



-- 
Anshum Gupta

Re: modular QueryParser in contrib

2015-09-21 Thread Jack Krupansky

Oops, sorry for the very old source code links, although nothing much
changed in the current release:
http://lucene.apache.org/core/5_3_0/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html
http://lucene.apache.org/core/5_3_0/queryparser/org/apache/lucene/queryparser/flexible/standard/StandardQueryParser.html

-- Jack Krupansky

On Mon, Sep 21, 2015 at 6:57 AM, Jack Krupansky 
wrote:

> Probably a reference to the so-called flex query parser:
>
> https://lucene.apache.org/core/4_10_0/queryparser/org/apache/lucene/queryparser/flexible/standard/StandardQueryParser.html
>
> Read:
>
> https://lucene.apache.org/core/4_10_0/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html
>
> The original Jira:
> https://issues.apache.org/jira/browse/LUCENE-1567
>
> This new query parser was dumped into Lucene some years ago, but I haven't
> noticed any real activity or interest in it.
>
> -- Jack Krupansky
>
> On Mon, Sep 21, 2015 at 6:36 AM, Dmitry Kan  wrote:
>
>> Hello!
>>
>> Asked the question on IRC, mirroring it here too: In lucene level QP there
>> is a comment
>>
>> https://github.com/apache/lucene-solr/blob/lucene_solr_4_10/lucene/queryparser/src/java/org/apache/lucene/queryparser/classic/QueryParser.jj#L99
>> pointing to some contrib query parser, that offers modularity and
>> customizability.
>>
>> Can you point to what the exact class is?
>>
>> --
>> Dmitry Kan
>> Luke Toolbox: http://github.com/DmitryKey/luke
>> Blog: http://dmitrykan.blogspot.com
>> Twitter: http://twitter.com/dmitrykan
>> SemanticAnalyzer: www.semanticanalyzer.info
>>
>
>

Re: How can I get a monotonically increasing field value for docs?

2015-09-21 Thread Shawn Heisey

On 9/21/2015 3:09 AM, Upayavira wrote:
> Effectively, all it does is return the value of NOW according to the
> request, as the default value.
> 
> You could construct that on a per invocation basis, using
> System.getMillis() or whatever.

The millisecond timestamp isn't guaranteed to always increase on every
call -- it's not monotonic.

http://stackoverflow.com/a/2979239/2665648

If the OS and hardware are capable of doing it, nanoTime IS monotonic,
and MIGHT be updated more frequently.

Thanks,
Shawn

Pivot facets

2015-09-21 Thread EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS)

Hi  Can someone suggest or any workaround for my issue with Pivot facets?

Use case: I have a collection with 5 Levels of fields, In some documents the 
data won't be there for Five Levels (fields) and I am not indexing the columns 
(No column for that doc).

In some search results the Pivot facets returns the nodes for all levels as 
tree (when data present in all fields) and some scenario  all levels are not 
getting (e.g. in case where fields not there), Is it mandatory to have all the 
fields in each doc to get the pivot facet always..?

Thanks


Ravi Kumar Taminidi

Bosch Automotive Aftermarket
Automotive Service Solutions

28635 Mound Road
Warren, MI 48092
USA
www.bosch.com

Tel +1 586 578-7367
external.ravi.tamin...@us.bosch.com

Re: How can I get a monotonically increasing field value for docs?

2015-09-21 Thread Shawn Heisey

On 9/21/2015 9:01 AM, Gili Nachum wrote:
> TimestampUpdateProcessorFactory takes place only on the leader shard, or on
> each shard replica?
> if on each replica then I would get different values on each replica.
>
> My alternative would be to perform secondary sort on a UUID to ensure order.

If the update chain is configured properly, it runs on the leader, so
all replicas get the same timestamp.

Without SolrCloud, the way to create an "indexed at" time field is in
the schema -- specify a default value of NOW on the field definition and
don't send the field when indexing.  The old master/slave replication
copies the actual index contents, so the indexed values in all replicas
are the same.

The problem with NOW in the schema when running SolrCloud is that each
replica indexes the document independently, so each replica can have a
different timestamp.  This is why the timestamp update processor exists
-- to set the timestamp to a specific value before the document is
duplicated to each replica, eliminating the problem.

FYI, secondary sort parameters affect the order when the primary sort
field is identical between two documents.  It may not do what you are
intending because of that.

Thanks,
Shawn

Re: Ideas

2015-09-21 Thread Paul Libbrecht

Writing a query component would be pretty easy or?
It would throw an exception if crazy numbers are requested...

I can provide a simple example of a maven project for a query component.

Paul


William Bell wrote:
> We have some Denial of service attacks on our web site. SOLR threads are
> going crazy.
>
> Basically someone is hitting start=15 + and rows=20. The start is crazy
> large.
>
> And then they jump around. start=15 then start=213030 etc.
>
> Any ideas for how to stop this besides blocking these IPs?
>
> Sometimes it is Google doing it even though these search results are set
> with No-index and No-Follow on these pages.
>
> Thoughts? Ideas?

Re: Ideas

2015-09-21 Thread DVT

Hi Bill,
  the classical way would be to have a reverse proxy in front of the
application that catches such cases. A decent reverse proxy or even
application firewall router will allow you to define limits on bandwidth
and sessions per time unit. Some even recognize specific
denial-of-service patterns.

Of course, you could also simply limit the ranges of parameters accepted
over the Internet - unless these wild ranges may actually occur in valid
scenarios.

A bit more complex is the third alternative that requires valid sessions
and permits paging only in one or the other direction. This way, start
and offset values would not be exposed, only functions for next
page/previous page or maybe some larger steps would be supported.
Stepping to one offset would also only be permitted if you come from a
proper previous page. Initial requests (in new sessions) would have to
start at offset 1. Constraints on the parameters in subsequent requests
within a session are a bit harder to handle.

Cheers,
--Jürgen

On 21.09.2015 19:28, William Bell wrote:
> We have some Denial of service attacks on our web site. SOLR threads are
> going crazy.
>
> Basically someone is hitting start=15 + and rows=20. The start is crazy
> large.
>
> And then they jump around. start=15 then start=213030 etc.
>
> Any ideas for how to stop this besides blocking these IPs?
>
> Sometimes it is Google doing it even though these search results are set
> with No-index and No-Follow on these pages.
>
> Thoughts? Ideas?
>
> Thanks
>

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением

*i.A. Jürgen Wagner*
Head of Competence Center "Intelligence"
& Senior Cloud Consultant

DevoteThem GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wag...@devoteam.com
, URL: www.devoteam.de

Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071

Re: Ideas

2015-09-21 Thread Doug Turnbull

The nginx reverse proxy we use blocks ridicilous start and rows values

https://github.com/o19s/solr_nginx

Another silly thing I've noticed is you can pass sleep() as a function
query. It's not documented, but I think a big hole. I wonder if I could DoS
your Solr by sleeping and hogging all the available query threads?

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.solr/solr-core/4.3.0/org/apache/solr/search/ValueSourceParser.java#114

On Mon, Sep 21, 2015 at 1:37 PM, Jürgen Wagner (DVT) <
juergen.wag...@devoteam.com> wrote:

> Hi Bill,
>   the classical way would be to have a reverse proxy in front of the
> application that catches such cases. A decent reverse proxy or even
> application firewall router will allow you to define limits on bandwidth
> and sessions per time unit. Some even recognize specific denial-of-service
> patterns.
>
> Of course, you could also simply limit the ranges of parameters accepted
> over the Internet - unless these wild ranges may actually occur in valid
> scenarios.
>
> A bit more complex is the third alternative that requires valid sessions
> and permits paging only in one or the other direction. This way, start and
> offset values would not be exposed, only functions for next page/previous
> page or maybe some larger steps would be supported. Stepping to one offset
> would also only be permitted if you come from a proper previous page.
> Initial requests (in new sessions) would have to start at offset 1.
> Constraints on the parameters in subsequent requests within a session are a
> bit harder to handle.
>
> Cheers,
> --Jürgen
>
> On 21.09.2015 19:28, William Bell wrote:
>
> We have some Denial of service attacks on our web site. SOLR threads are
> going crazy.
>
> Basically someone is hitting start=15 + and rows=20. The start is crazy
> large.
>
> And then they jump around. start=15 then start=213030 etc.
>
> Any ideas for how to stop this besides blocking these IPs?
>
> Sometimes it is Google doing it even though these search results are set
> with No-index and No-Follow on these pages.
>
> Thoughts? Ideas?
>
> Thanks
>
>
>
> Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
> уважением
>
> *i.A. Jürgen Wagner*
> Head of Competence Center "Intelligence"
> & Senior Cloud Consultant
>
> DevoteThem GmbH, Industriestr. 3, 70565 Stuttgart, Germany
> Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
> E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
> --
> Managing Board: Jürgen Hatzipantelis (CEO)
> Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
> Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
>
>
>
>


-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
, LLC | 240.476.9983
Author: Relevant Search 
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Ideas

2015-09-21 Thread William Bell

We have some Denial of service attacks on our web site. SOLR threads are
going crazy.

Basically someone is hitting start=15 + and rows=20. The start is crazy
large.

And then they jump around. start=15 then start=213030 etc.

Any ideas for how to stop this besides blocking these IPs?

Sometimes it is Google doing it even though these search results are set
with No-index and No-Follow on these pages.

Thoughts? Ideas?

Thanks

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: solr4.7: leader core does not elected to other active core after sorl OS shutdown, known issue?

2015-09-21 Thread Jeff Wu

Hi Shai, still the same question: other peer cores which they are active
did not claim to be leader after a long time.  However, some of the peer
cores claimed to be leaders at earlier time when server stopping. That's
inconsistent results

2015-09-21 10:52 GMT-04:00 Shai Erera :

> I don't think the process Shalin describes applies to clusterstate.json.
> That JSON object reflects the status Solr "knows" about, or "last known
> status". When Solr is properly shutdown, I believe those attributes are
> cleared from clusterstate.json, as well the leaders give up their lease.
>
> However, when Solr is killed, it takes ZK the 30 seconds or so timeout to
> kill the ephemeral node and release the leader lease. ZK is unaware of
> Solr's clusterstate.json and cannot update the 'leader' property to false.
> It simply releases the lease, so that other cores may claim it.
>
> Perhaps that explains the confusion?
>
> Shai
>
> On Mon, Sep 21, 2015 at 4:36 PM, Jeff Wu  wrote:
>
> > Hi Shalin,  thank you for the response.
> >
> > We waited longer enough than the ZK session timeout time, and it still
> did
> > not kick off any leader election for these "remained down-leader" cores.
> > That's the question I'm actually asking.
> >
> > Our test scenario:
> >
> > Each solr server has 64 cores, and they are all active, and all leader
> > cores.
> > Shutdown the linux OS.
> > Monitor clusterstate.json over ZK, after enough ZK session timeout value.
> > We noticed some cores has leader election happened. But still saw some
> down
> > cores remains leader.
> >
> > 2015-09-21 9:15 GMT-04:00 Shalin Shekhar Mangar  >:
> >
> > > Hi Jeff,
> > >
> > > The leader election relies on ephemeral nodes in Zookeeper to detect
> > > when leader or other nodes have gone down (abruptly). These ephemeral
> > > nodes are automatically deleted by ZooKeeper after the ZK session
> > > timeout which is by default 30 seconds. So if you kill a node then it
> > > can take up to 30 seconds for the cluster to detect it and start a new
> > > leader election. This won't be necessary during a graceful shutdown
> > > because on shutdown the node will give up leader position so that a
> > > new one can be elected. You could tune the zk session timeout to a
> > > lower value but then it makes the cluster more sensitive to GC pauses
> > > which can also trigger new leader elections.
> > >
> > > On Mon, Sep 21, 2015 at 5:55 PM, Jeff Wu  wrote:
> > > > Our environment still run with Solr4.7. Recently we noticed in a
> test.
> > > When
> > > > we stopped 1 solr server(solr02, which did OS shutdown), all the
> cores
> > of
> > > > solr02 are shown as "down", but remains a few cores still as leaders.
> > > After
> > > > that, we quickly seeing all other servers are still sending requests
> to
> > > > that down solr server, and therefore we saw a lot of TCP waiting
> > threads
> > > in
> > > > thread pool of other solr servers since solr02 already down.
> > > >
> > > > "shard53":{
> > > > "range":"2666-2998",
> > > > "state":"active",
> > > > "replicas":{
> > > >   "core_node102":{
> > > > "state":"down",
> > > > "base_url":"https://solr02.myhost/solr;,
> > > > "core":"collection2_shard53_replica1",
> > > > "node_name":"https://solr02.myhost_solr;,
> > > > "leader":"true"},
> > > >   "core_node104":{
> > > > "state":"active",
> > > > "base_url":"https://solr04.myhost/solr;,
> > > > "core":"collection2_shard53_replica2",
> > > > "node_name":"https://solr04.myhost/solr_solr"}}},
> > > >
> > > > Is this something known bug in 4.7 and late on fixed? Any reference
> > JIRA
> > > we
> > > > can study about?  If the solr service is stopped gracefully, we can
> see
> > > > leader core election happens and switched to other active core. But
> if
> > we
> > > > just directly shutdown a Solr OS, we can reproduce in our environment
> > > that
> > > > some "Down" cores remains "leader" at ZK clusterstate.json
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
> >
>



-- 
Jeff Wu
---
CSDL Beijing, China

Re: Ideas

2015-09-21 Thread Walter Underwood

I have put a limit in the front end at a couple of sites. Nobody gets more than 
50 pages of results. Show page 50 if they request beyond that.

First got hit by this at Netflix, years ago.

Solr 4 is much better about deep paging, but here at Chegg we got deep paging 
plus a stupid, long query. That was using too much CPU.

Right now, block the IPs. Those are hostile.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Sep 21, 2015, at 10:31 AM, Paul Libbrecht  wrote:
> 
> Writing a query component would be pretty easy or?
> It would throw an exception if crazy numbers are requested...
> 
> I can provide a simple example of a maven project for a query component.
> 
> Paul
> 
> 
> William Bell wrote:
>> We have some Denial of service attacks on our web site. SOLR threads are
>> going crazy.
>> 
>> Basically someone is hitting start=15 + and rows=20. The start is crazy
>> large.
>> 
>> And then they jump around. start=15 then start=213030 etc.
>> 
>> Any ideas for how to stop this besides blocking these IPs?
>> 
>> Sometimes it is Google doing it even though these search results are set
>> with No-index and No-Follow on these pages.
>> 
>> Thoughts? Ideas?
>

Re: Zero Query results

2015-09-21 Thread Erick Erickson

bq: However, I discovered that if I search on "Wednesday*" (trailing
asterisk), then I get all the results containing Wednesday that I'm
looking for!

This almost always means you're not searching on the field you think
you're searching on and/or the field isn't being analyzed as you think
(i.e. the fieldType isn't what you expect). If you're really searching
on a fieldType of text_en (and you haven't changed the definition),
then there's something very weird here. FieldTypes are totally
mutable, they are composed of various analysis chains that you (or
someone else) can freely alter, so seeing the  definition that
references a type="text_en" is suggestive but not definitive.

I'm going to further guess that when you search on "Wednesday*", all
the matches are at the beginning of the line, and you find docs where
the field has "Wednesday, September" but not "The party was on
Wednesday".

So let's see the  associated with the logtext field. Plus,
the results of adding =true to the query.

But you can get a lot of info a lot faster if you go to the admin UI
screen, select the proper core from the drop-down on the left sied and
go to the "analysis" section. Pick the field (or field type), enter
some text and hit analyze (or uncheck the "verbose" box, that's
largely uninteresting info at this level). That'll show you exactly
how the input document is parsed, exactly how the query is parsed etc.
And be sure to enter something like
"september first was a Wednesday" in the left-hand (index) box, then
just "Wednesday" in the right hand (query) side. My bet: You'll see on
the index side that the input is not broken up, not transformed, etc.

Best,
Erick

On Mon, Sep 21, 2015 at 9:49 AM, Mark Fenbers  wrote:
> Ok, Erick, you provided useful info to help with my understanding. However,
> I still get zero results when I search on literal text (e.g., "Wednesday"),
> even with making changes that you suggest. However, I discovered that if I
> search on "Wednesday*" (trailing asterisk), then I get all the results
> containing Wednesday that I'm looking for!  Why would adding a wildcard
> token change the results I get back?
>
> In my schema.xml, my customized section now looks like this, based on your
> previous message:
>
> 
>  required="true" />
>  required="true" />
>  required="true" />
>
>  multiValued="true" />
> 
> 
>
> Then I removed the data subdir, did a solr restart, and did a /dataimport
> again.  It successfully processed all 9857 documents. No stack traces in
> solr.log.  It is at this point that searching on Wednesday gave zero results
> (Boo!), but searching on Wednesday* gave hundreds of results. (Yay!)  My
> changes to schema.xml were to make logtext be the type "text_en".
> Previously, the only line in schema.xml was the first one ("id"), and I
> changed that from type="text" to type="date" because it is a Timestamp
> object in Java and a "timestamp without time zone" in PostgreSQL.  But even
> with these changes, the results are the same as before.
>
> Do you have any more ideas why searching on any literal string finds zero
> documents?
>
> Thanks,
> Mark
>
>
> On 9/18/2015 10:30 PM, Erick Erickson wrote:
>>
>> bq: There is no fieldType defined in my solrconfig.xml, unless you are
>> referring to this line:
>>
>> Well, that's because you should be looking in schema.xml ;).
>>
>> This line from your stacktrace file is very suspicious:
>>logtext:Wednesday
>>
>> It _looks_ like your logtext file is perhaps a "string" type. String
>> types are totally unanalyzed,
>> so unless the input matches _exactly_ (and by exactly mean same case,
>> same words, same
>> order, identical punctuation) you won't find the doc. Thus with a
>> string field type, if the doc had
>> "my Dog has fleas.", searching for "my" or "My" or "My dog has fleas"
>> or "my Dog has fleas"
>> would all not find the doc (this last one has no period".
>>
>> You usually want one of the text types, text_en or the like. Note that
>> you will be a _long_ time
>> figuring out how all that works and affects your searches, the
>> admin/analysis page is definitely
>> your friend.
>>
>> There should be a line similar to
>> 
>>
>> Somewhere else there should be something like:
>> > of lines maybe not />
>>
>> The fieldType is what determines how the text is handled to search,
>> how it's broken up
>> and, in essence, how searches behave.
>>
>> So what Erik and Shawn were asking is those two definitions.
>>
>> Do note if you've changed the definitions here, it's usually wise to
>> 'rm -rf /data' and completely re-index from scratch.
>>
>> Best,
>> Erick
>>
>

Re: solr update dynamic field generates multiValued error

2015-09-21 Thread Upayavira

Aman,

I cannot promise to answer questions promptly - like most people on this
list, we answer if/when we have a gap in our workload.

The reason you are getting the non multiValued field error is because
your latlon field does not have multiValued="true" enabled.

However, the field type definition notes that this field type does not
support multivalued fields, so you're not gonna get anywhere with that
route.

Have you tried the location_rpt type?
(solr.SpatialRecursivePrefixTreeFieldType). This is a newer, and as I
understand it, far more flexible field type - for example, you can index
shapes into it as well as locations.

I'd suggest you read this page, and pay particular attention to mentions
of RPT:

https://cwiki.apache.org/confluence/display/solr/Spatial+Search

Upayavira

On Mon, Sep 21, 2015, at 10:36 AM, Aman Tandon wrote:
> Upayavira, please help
> 
> With Regards
> Aman Tandon
> 
> On Mon, Sep 21, 2015 at 2:38 PM, Aman Tandon 
> wrote:
> 
> > Error is
> >
> > 
> > 
> > 400 > name="QTime">28ERROR:
> > [doc=9474144846] multiple values encountered for non multiValued field
> > latlon_0_coordinate: [11.0183, 11.0183] > name="code">400
> > 
> >
> > And my configuration is
> >
> > 
> > >  stored="true" />
> >
> >  
> >  > subFieldSuffix="_coordinate"/>
> >
> >> required="false" multiValued="false" />
> >
> >  how you know it is because of stored="true"?
> >
> > As Erick replied in the last mail thread,
> > I'm not getting any multiple values in the _coordinate fields. However, I
> > _do_ get the error if my dynamic *_coordinate field is set to
> > stored="true".
> >
> > And stored="true" is mandatory for using the atomic updates.
> >
> > With Regards
> > Aman Tandon
> >
> > On Mon, Sep 21, 2015 at 2:22 PM, Upayavira  wrote:
> >
> >> Can you show the error you are getting, and how you know it is because
> >> of stored="true"?
> >>
> >> Upayavira
> >>
> >> On Mon, Sep 21, 2015, at 09:30 AM, Aman Tandon wrote:
> >> > Hi Erick,
> >> >
> >> > I am getting the same error because my dynamic field *_coordinate is
> >> > stored="true".
> >> > How can I get rid of this error?
> >> >
> >> > And I have to use the atomic update. Please help!!
> >> >
> >> > With Regards
> >> > Aman Tandon
> >> >
> >> > On Tue, Aug 5, 2014 at 10:27 PM, Franco Giacosa 
> >> > wrote:
> >> >
> >> > > Hey Erick, i think that you were right, there was a mix in the
> >> schemas and
> >> > > that was generating the error on some of the documents.
> >> > >
> >> > > Thanks for the help guys!
> >> > >
> >> > >
> >> > > 2014-08-05 1:28 GMT-03:00 Erick Erickson :
> >> > >
> >> > > > Hmmm, I jus tried this with a 4.x build and I can update the
> >> document
> >> > > > multiple times without a problem. I just indexed the standard
> >> exampledocs
> >> > > > and then updated a doc like this (vidcard.xml was the base):
> >> > > >
> >> > > > 
> >> > > > 
> >> > > >   EN7800GTX/2DHTV/256M
> >> > > >
> >> > > >   eoe changed this
> >> puppy
> >> > > > 
> >> > > >   
> >> > > > 
> >> > > >
> >> > > > I'm not getting any multiple values in the _coordinate fields.
> >> However, I
> >> > > > _do_ get the error if my dynamic *_coordinate field is set to
> >> > > > stored="true".
> >> > > >
> >> > > > Did you perhaps change this at some point? Whenever I change the
> >> schema,
> >> > > I
> >> > > > try to 'rm -rf solr/collection/data' just to be sure I've purged all
> >> > > traces
> >> > > > of the former schema definition.
> >> > > >
> >> > > > Best,
> >> > > > Erick
> >> > > >
> >> > > >
> >> > > > On Mon, Aug 4, 2014 at 7:04 PM, Franco Giacosa 
> >> > > wrote:
> >> > > >
> >> > > > > No, they are not declarad explicitly.
> >> > > > >
> >> > > > > This is how they are created:
> >> > > > >
> >> > > > >  >> stored="true"/>
> >> > > > >
> >> > > > >  >> > > > >  stored="false"/>
> >> > > > >
> >> > > > >  >> > > > > subFieldSuffix="_coordinate"/>
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > 2014-08-04 22:28 GMT-03:00 Michael Ryan :
> >> > > > >
> >> > > > > > Are the latLong_0_coordinate and latLong_1_coordinate fields
> >> > > populated
> >> > > > > > using copyField? If so, this sounds like it could be
> >> > > > > > https://issues.apache.org/jira/browse/SOLR-3502.
> >> > > > > >
> >> > > > > > -Michael
> >> > > > > >
> >> > > > > > -Original Message-
> >> > > > > > From: Franco Giacosa [mailto:fgiac...@gmail.com]
> >> > > > > > Sent: Monday, August 04, 2014 9:05 PM
> >> > > > > > To: solr-user@lucene.apache.org
> >> > > > > > Subject: solr update dynamic field generates multiValued error
> >> > > > > >
> >> > > > > > Hello everyone, this is my first time posting a question, so
> >> forgive
> >> > > me
> >> > > > > if
> >> > > > > > i'm missing something.
> >> > > > > >
> >> > > > > > This is my problem:
> >> > > > > >
> >> > > > > > I have a schema.xml that has the following latLong

Re: Zero Query results

2015-09-21 Thread Mark Fenbers

You were right about finding only the Wednesday occurrences at the 
beginning of the line.  But attached (if it works) is a screen capture 
of my admin UI.  But unlike your suspicion, the index text is being 
parsed properly, it appears.  So I'm uncertain where this leads me.


Also attached is the pertinent schema.xml snippet you asked for.

The logtext column in my table contains merely keyboarded text, with the 
infrequent exception that I add a \uFFFC as a placeholder for images.  
So, should I be using something besides text_en as the fieldType?


Thanks,
Mark

On 9/21/2015 12:12 PM, Erick Erickson wrote:

bq: However, I discovered that if I search on "Wednesday*" (trailing
asterisk), then I get all the results containing Wednesday that I'm
looking for!

This almost always means you're not searching on the field you think
you're searching on and/or the field isn't being analyzed as you think
(i.e. the fieldType isn't what you expect). If you're really searching
on a fieldType of text_en (and you haven't changed the definition),
then there's something very weird here. FieldTypes are totally
mutable, they are composed of various analysis chains that you (or
someone else) can freely alter, so seeing the  definition that
references a type="text_en" is suggestive but not definitive.

I'm going to further guess that when you search on "Wednesday*", all
the matches are at the beginning of the line, and you find docs where
the field has "Wednesday, September" but not "The party was on
Wednesday".

So let's see the  associated with the logtext field. Plus,
the results of adding =true to the query.

But you can get a lot of info a lot faster if you go to the admin UI
screen, select the proper core from the drop-down on the left sied and
go to the "analysis" section. Pick the field (or field type), enter
some text and hit analyze (or uncheck the "verbose" box, that's
largely uninteresting info at this level). That'll show you exactly
how the input document is parsed, exactly how the query is parsed etc.
And be sure to enter something like
"september first was a Wednesday" in the left-hand (index) box, then
just "Wednesday" in the right hand (query) side. My bet: You'll see on
the index side that the input is not broken up, not transformed, etc.

Best,
Erick

Re: How can I get a monotonically increasing field value for docs?

2015-09-21 Thread Gili Nachum

TimestampUpdateProcessorFactory takes place only on the leader shard, or on
each shard replica?
if on each replica then I would get different values on each replica.

My alternative would be to perform secondary sort on a UUID to ensure order.
Thanks.

On Mon, Sep 21, 2015 at 12:09 PM, Upayavira  wrote:

> There's nothing to stop you creating your own
> TimestampUpdateProcessorFactory, here's the entire source for it:
>
> public class TimestampUpdateProcessorFactory
>   extends AbstractDefaultValueUpdateProcessorFactory {
>
>   @Override
>   public UpdateRequestProcessor getInstance(SolrQueryRequest req,
> SolrQueryResponse rsp,
> UpdateRequestProcessor next
> ) {
> return new DefaultValueUpdateProcessor(fieldName, next) {
>   @Override
>   public Object getDefaultValue() {
> return SolrRequestInfo.getRequestInfo().getNOW();
>   }
> };
>   }
> }
>
> Effectively, all it does is return the value of NOW according to the
> request, as the default value.
>
> You could construct that on a per invocation basis, using
> System.getMillis() or whatever.
>
> Upayavira
>
> On Mon, Sep 21, 2015, at 07:34 AM, Gili Nachum wrote:
> > I've implemented a custom solr2solr ongoing unidirectional replication
> > mechanism.
> >
> > A Replicator (acting as solrJ client), crawls documents from SolrCloud1
> > and
> > writes them to SolrCloud2 in batches.
> > The replicator crawl logic is to read documents with a time
> > greater/equale
> > to the time of the last replicated document.
> > Whenever a document is added/updated, I auto updated a a tdate field
> > "last_updated_in_solr" using TimestampUpdateProcessorFactory.
> >
> > *My problem: *When a client indexes a batch of 100 documents, all 100
> > docs
> > have the same "last_updated_in_solr" value. This makes my ongoing
> > replication check for new documents to replicate much more complex than
> > if
> > the time value was unique.
> >
> > 1. Can I use some other processor to generate increasing unique values?
> > 2. Can I use the internal _version_ field for this? is it guaranteed to
> > be
> > monotonically increasing for the entire collection or only per document,
> > with each add/update?
> > Any other options?
> >
> > Schema.xml:
> >  > stored="true" multiValued="false"/>
> >
> > solrconfig.xml:
> > 
> >
> >last_updated_in_solr
> >
> >
> >
> > 
> >
> > I know there's work for a build-in replication mechanism, but it's not
> > yet
> > released.
> > Using Solr 4.7.2.
>

Re: write.lock

2015-09-21 Thread Mark Fenbers

A snippet of my solrconfig.xml is attached.  The snippet only contains 
the Spell checking sections (for brevity) which should be sufficient for 
you to see all the pertinent info you seek.


Thanks!
Mark

On 9/19/2015 3:29 AM, Mikhail Khludnev wrote:

Mark,

What's your solconfig.xml?

On Sat, Sep 19, 2015 at 12:34 AM, Mark Fenbers 
wrote:


Greetings,

Whenever I try to build my spellcheck index
(params.set("spellcheck.build", true); or put a check in the
spellcheck.build box in the web interface) I get the following stacktrace.
Removing the write.lock file does no good.  The message comes right back
anyway.  I read in a post that increasing writeLockTimeout would help.  It
did not help for me even increasing it to 20,000 msec.  If I don't build,
then my resultset count is always 0, i.e., empty results.  What could be
causing this?

Mark






 
  

text_en





  index
  logtext
  
  solr.IndexBasedSpellChecker
  /localapps/dev/EventLog/index
  true  
  
  
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  




  wordbreak
  solr.WordBreakSolrSpellChecker
  logtext
  true
  true
  10










   
 solr.FileBasedSpellChecker
logtext 
FileDict
 /usr/share/dict/words
 UTF-8
 /localapps/dev/EventLog/index
  

  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  
   
  
  

  
  

  


  FileDict

  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

Re: modular QueryParser in contrib

2015-09-21 Thread Dmitry Kan

Thanks for the valuable links Jack.

Dmitry

On Mon, Sep 21, 2015 at 5:09 PM, Jack Krupansky 
wrote:

> Oops, sorry for the very old source code links, although nothing much
> changed in the current release:
>
> http://lucene.apache.org/core/5_3_0/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html
>
> http://lucene.apache.org/core/5_3_0/queryparser/org/apache/lucene/queryparser/flexible/standard/StandardQueryParser.html
>
> -- Jack Krupansky
>
> On Mon, Sep 21, 2015 at 6:57 AM, Jack Krupansky 
> wrote:
>
> > Probably a reference to the so-called flex query parser:
> >
> >
> https://lucene.apache.org/core/4_10_0/queryparser/org/apache/lucene/queryparser/flexible/standard/StandardQueryParser.html
> >
> > Read:
> >
> >
> https://lucene.apache.org/core/4_10_0/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html
> >
> > The original Jira:
> > https://issues.apache.org/jira/browse/LUCENE-1567
> >
> > This new query parser was dumped into Lucene some years ago, but I
> haven't
> > noticed any real activity or interest in it.
> >
> > -- Jack Krupansky
> >
> > On Mon, Sep 21, 2015 at 6:36 AM, Dmitry Kan 
> wrote:
> >
> >> Hello!
> >>
> >> Asked the question on IRC, mirroring it here too: In lucene level QP
> there
> >> is a comment
> >>
> >>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_10/lucene/queryparser/src/java/org/apache/lucene/queryparser/classic/QueryParser.jj#L99
> >> pointing to some contrib query parser, that offers modularity and
> >> customizability.
> >>
> >> Can you point to what the exact class is?
> >>
> >> --
> >> Dmitry Kan
> >> Luke Toolbox: http://github.com/DmitryKey/luke
> >> Blog: http://dmitrykan.blogspot.com
> >> Twitter: http://twitter.com/dmitrykan
> >> SemanticAnalyzer: www.semanticanalyzer.info
> >>
> >
> >
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: solr4.7: leader core does not elected to other active core after sorl OS shutdown, known issue?

2015-09-21 Thread Shai Erera

I don't think the process Shalin describes applies to clusterstate.json.
That JSON object reflects the status Solr "knows" about, or "last known
status". When Solr is properly shutdown, I believe those attributes are
cleared from clusterstate.json, as well the leaders give up their lease.

However, when Solr is killed, it takes ZK the 30 seconds or so timeout to
kill the ephemeral node and release the leader lease. ZK is unaware of
Solr's clusterstate.json and cannot update the 'leader' property to false.
It simply releases the lease, so that other cores may claim it.

Perhaps that explains the confusion?

Shai

On Mon, Sep 21, 2015 at 4:36 PM, Jeff Wu  wrote:

> Hi Shalin,  thank you for the response.
>
> We waited longer enough than the ZK session timeout time, and it still did
> not kick off any leader election for these "remained down-leader" cores.
> That's the question I'm actually asking.
>
> Our test scenario:
>
> Each solr server has 64 cores, and they are all active, and all leader
> cores.
> Shutdown the linux OS.
> Monitor clusterstate.json over ZK, after enough ZK session timeout value.
> We noticed some cores has leader election happened. But still saw some down
> cores remains leader.
>
> 2015-09-21 9:15 GMT-04:00 Shalin Shekhar Mangar :
>
> > Hi Jeff,
> >
> > The leader election relies on ephemeral nodes in Zookeeper to detect
> > when leader or other nodes have gone down (abruptly). These ephemeral
> > nodes are automatically deleted by ZooKeeper after the ZK session
> > timeout which is by default 30 seconds. So if you kill a node then it
> > can take up to 30 seconds for the cluster to detect it and start a new
> > leader election. This won't be necessary during a graceful shutdown
> > because on shutdown the node will give up leader position so that a
> > new one can be elected. You could tune the zk session timeout to a
> > lower value but then it makes the cluster more sensitive to GC pauses
> > which can also trigger new leader elections.
> >
> > On Mon, Sep 21, 2015 at 5:55 PM, Jeff Wu  wrote:
> > > Our environment still run with Solr4.7. Recently we noticed in a test.
> > When
> > > we stopped 1 solr server(solr02, which did OS shutdown), all the cores
> of
> > > solr02 are shown as "down", but remains a few cores still as leaders.
> > After
> > > that, we quickly seeing all other servers are still sending requests to
> > > that down solr server, and therefore we saw a lot of TCP waiting
> threads
> > in
> > > thread pool of other solr servers since solr02 already down.
> > >
> > > "shard53":{
> > > "range":"2666-2998",
> > > "state":"active",
> > > "replicas":{
> > >   "core_node102":{
> > > "state":"down",
> > > "base_url":"https://solr02.myhost/solr;,
> > > "core":"collection2_shard53_replica1",
> > > "node_name":"https://solr02.myhost_solr;,
> > > "leader":"true"},
> > >   "core_node104":{
> > > "state":"active",
> > > "base_url":"https://solr04.myhost/solr;,
> > > "core":"collection2_shard53_replica2",
> > > "node_name":"https://solr04.myhost/solr_solr"}}},
> > >
> > > Is this something known bug in 4.7 and late on fixed? Any reference
> JIRA
> > we
> > > can study about?  If the solr service is stopped gracefully, we can see
> > > leader core election happens and switched to other active core. But if
> we
> > > just directly shutdown a Solr OS, we can reproduce in our environment
> > that
> > > some "Down" cores remains "leader" at ZK clusterstate.json
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>

Re: solr update dynamic field generates multiValued error

2015-09-21 Thread Aman Tandon

Upayavira, please help

With Regards
Aman Tandon

On Mon, Sep 21, 2015 at 2:38 PM, Aman Tandon 
wrote:

> Error is
>
> 
> 
> 400 name="QTime">28ERROR:
> [doc=9474144846] multiple values encountered for non multiValued field
> latlon_0_coordinate: [11.0183, 11.0183] name="code">400
> 
>
> And my configuration is
>
> 
>  stored="true" />
>
>  
>  subFieldSuffix="_coordinate"/>
>
>required="false" multiValued="false" />
>
>  how you know it is because of stored="true"?
>
> As Erick replied in the last mail thread,
> I'm not getting any multiple values in the _coordinate fields. However, I
> _do_ get the error if my dynamic *_coordinate field is set to
> stored="true".
>
> And stored="true" is mandatory for using the atomic updates.
>
> With Regards
> Aman Tandon
>
> On Mon, Sep 21, 2015 at 2:22 PM, Upayavira  wrote:
>
>> Can you show the error you are getting, and how you know it is because
>> of stored="true"?
>>
>> Upayavira
>>
>> On Mon, Sep 21, 2015, at 09:30 AM, Aman Tandon wrote:
>> > Hi Erick,
>> >
>> > I am getting the same error because my dynamic field *_coordinate is
>> > stored="true".
>> > How can I get rid of this error?
>> >
>> > And I have to use the atomic update. Please help!!
>> >
>> > With Regards
>> > Aman Tandon
>> >
>> > On Tue, Aug 5, 2014 at 10:27 PM, Franco Giacosa 
>> > wrote:
>> >
>> > > Hey Erick, i think that you were right, there was a mix in the
>> schemas and
>> > > that was generating the error on some of the documents.
>> > >
>> > > Thanks for the help guys!
>> > >
>> > >
>> > > 2014-08-05 1:28 GMT-03:00 Erick Erickson :
>> > >
>> > > > Hmmm, I jus tried this with a 4.x build and I can update the
>> document
>> > > > multiple times without a problem. I just indexed the standard
>> exampledocs
>> > > > and then updated a doc like this (vidcard.xml was the base):
>> > > >
>> > > > 
>> > > > 
>> > > >   EN7800GTX/2DHTV/256M
>> > > >
>> > > >   eoe changed this
>> puppy
>> > > > 
>> > > >   
>> > > > 
>> > > >
>> > > > I'm not getting any multiple values in the _coordinate fields.
>> However, I
>> > > > _do_ get the error if my dynamic *_coordinate field is set to
>> > > > stored="true".
>> > > >
>> > > > Did you perhaps change this at some point? Whenever I change the
>> schema,
>> > > I
>> > > > try to 'rm -rf solr/collection/data' just to be sure I've purged all
>> > > traces
>> > > > of the former schema definition.
>> > > >
>> > > > Best,
>> > > > Erick
>> > > >
>> > > >
>> > > > On Mon, Aug 4, 2014 at 7:04 PM, Franco Giacosa 
>> > > wrote:
>> > > >
>> > > > > No, they are not declarad explicitly.
>> > > > >
>> > > > > This is how they are created:
>> > > > >
>> > > > > > stored="true"/>
>> > > > >
>> > > > > > > > > >  stored="false"/>
>> > > > >
>> > > > > > > > > > subFieldSuffix="_coordinate"/>
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > 2014-08-04 22:28 GMT-03:00 Michael Ryan :
>> > > > >
>> > > > > > Are the latLong_0_coordinate and latLong_1_coordinate fields
>> > > populated
>> > > > > > using copyField? If so, this sounds like it could be
>> > > > > > https://issues.apache.org/jira/browse/SOLR-3502.
>> > > > > >
>> > > > > > -Michael
>> > > > > >
>> > > > > > -Original Message-
>> > > > > > From: Franco Giacosa [mailto:fgiac...@gmail.com]
>> > > > > > Sent: Monday, August 04, 2014 9:05 PM
>> > > > > > To: solr-user@lucene.apache.org
>> > > > > > Subject: solr update dynamic field generates multiValued error
>> > > > > >
>> > > > > > Hello everyone, this is my first time posting a question, so
>> forgive
>> > > me
>> > > > > if
>> > > > > > i'm missing something.
>> > > > > >
>> > > > > > This is my problem:
>> > > > > >
>> > > > > > I have a schema.xml that has the following latLong information
>> > > > > >
>> > > > > > The dynamicField generates 2 dynamic fields that have the lat
>> and the
>> > > > > long
>> > > > > > (latLong_0_coordinate and latLong_1_coordinate)
>> > > > > >
>> > > > > > So for example a document will have
>> > > > > >
>> > > > > > "latLong_0_coordinate": 40.4114, "latLong_1_coordinate":
>> -74.1031,
>> > > > > > "latLong": "40.4114,-74.1031",
>> > > > > >
>> > > > > > Now when I try to update a document (i don't update the latLong
>> > > field.
>> > > > I
>> > > > > > just update other parts of the document using atomic update)
>> solr
>> > > > > > re-creates the dynamicField and adds the same value again, like
>> its
>> > > > using
>> > > > > > add instead of set. So when i do an update the fields of the
>> doc look
>> > > > > like
>> > > > > > this
>> > > > > >
>> > > > > > "latLong_0_coordinate": [40.4114,40.4114]
>> "latLong_1_coordinate":
>> > > > > > [-74.1031,-74.1031] "latLong": "40.4114,-74.1031",
>> > > > > >
>> > > > > > So the dynamicFields now have 2 values, so the next time that I
>> want
>> > > to
>> > > > > > update the document a schema error is throw because im trying

Re: SolrCloud Startup question

2015-09-21 Thread Anshum Gupta

Hi Ravi,

I just tried it out and here's my understanding:

1. Starting Solr with -c starts Solr in cloud mode. This is used to start
Solr with an embedded zookeeper.
2. Starting Solr with -z starts Solr in cloud mode, with the zk connection
string you specify. You don't need to explicitly specify -c in this case.
The help text there needs a bit of fixing though

*  -zZooKeeper connection string; only used when running in
SolrCloud mode using -c*
*   To launch an embedded ZooKeeper instance, don't pass
this parameter.*

*"only used when running in SolrCloud mode using -c" *needs to be rephrased
or removed. Can you create a JIRA for the same?

On Mon, Sep 21, 2015 at 1:35 PM, Ravi Solr  wrote:

> Can somebody kindly help me understand the difference between the following
> startup calls ?
>
> ./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
>
> Vs
>
> ./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
>
> What happens if i don't pass the "-c" option ?? I read the documentation
> but got more confused, I do run a ZK ensemble of 3 instances.  FYI my cloud
> seems to work fine and teh Admin UI shows Cloud graph just fine, but I want
> to just make sure I am doing the right thing and not missing any nuance.
>
> The following is form documention on cwiki.
> ---
>
> "Start Solr in SolrCloud mode, which will also launch the embedded
> ZooKeeper instance included with Solr.
>
> This option can be shortened to simply -c.
>
> If you are already running a ZooKeeper ensemble that you want to use
> instead of the embedded (single-node) ZooKeeper, you should also pass the
> -z parameter."
>
> -
>
> Thanks
>
> Ravi Kiran Bhaskar
>

-- 
Anshum Gupta

SolrCloud Startup question

2015-09-21 Thread Ravi Solr

Can somebody kindly help me understand the difference between the following
startup calls ?

./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181

Vs

./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181

What happens if i don't pass the "-c" option ?? I read the documentation
but got more confused, I do run a ZK ensemble of 3 instances.  FYI my cloud
seems to work fine and teh Admin UI shows Cloud graph just fine, but I want
to just make sure I am doing the right thing and not missing any nuance.

The following is form documention on cwiki.
---

"Start Solr in SolrCloud mode, which will also launch the embedded
ZooKeeper instance included with Solr.

This option can be shortened to simply -c.

If you are already running a ZooKeeper ensemble that you want to use
instead of the embedded (single-node) ZooKeeper, you should also pass the
-z parameter."

-

Thanks

Ravi Kiran Bhaskar

Re: SolrCloud Startup question

2015-09-21 Thread Upayavira

As it says below, -c enables a Zookeeper node within the same JVM as
Solr. You don't want that, as you already have an ensemble up and
running.

Upayavira

On Mon, Sep 21, 2015, at 09:35 PM, Ravi Solr wrote:
> Can somebody kindly help me understand the difference between the
> following
> startup calls ?
> 
> ./solr start -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> 
> Vs
> 
> ./solr start -c -p  -s /solr/home -z zk1:2181,zk2:2181,zk3:2181
> 
> What happens if i don't pass the "-c" option ?? I read the documentation
> but got more confused, I do run a ZK ensemble of 3 instances.  FYI my
> cloud
> seems to work fine and teh Admin UI shows Cloud graph just fine, but I
> want
> to just make sure I am doing the right thing and not missing any nuance.
> 
> The following is form documention on cwiki.
> ---
> 
> "Start Solr in SolrCloud mode, which will also launch the embedded
> ZooKeeper instance included with Solr.
> 
> This option can be shortened to simply -c.
> 
> If you are already running a ZooKeeper ensemble that you want to use
> instead of the embedded (single-node) ZooKeeper, you should also pass the
> -z parameter."
> 
> -
> 
> Thanks
> 
> Ravi Kiran Bhaskar

Re: FieldCache error for multivalued fields in json facets.

2015-09-21 Thread Vishnu Mishra

Hi I am using solr 5.3 and I have the same problem while doing json facet on
multivalued field. Below is the error stack trace :




2015-09-21 21:26:09,292 ERROR org.apache.solr.core.SolrCore  ?
org.apache.solr.common.SolrException: can not use FieldCache on multivalued
field: FLAG
at
org.apache.solr.schema.SchemaField.checkFieldCacheSource(SchemaField.java:187)
at
org.apache.solr.schema.TrieField.getValueSource(TrieField.java:231)
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:378)
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:235)
at
org.apache.solr.search.ValueSourceParser$79.parse(ValueSourceParser.java:845)
at
org.apache.solr.search.FunctionQParser.parseAgg(FunctionQParser.java:414)
at
org.apache.solr.search.facet.FacetParser.parseStringStat(FacetRequest.java:272)
at
org.apache.solr.search.facet.FacetParser.parseStringFacetOrStat(FacetRequest.java:265)
at
org.apache.solr.search.facet.FacetParser.parseFacetOrStat(FacetRequest.java:199)
at
org.apache.solr.search.facet.FacetParser.parseSubs(FacetRequest.java:179)
at
org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:427)
at
org.apache.solr.search.facet.FacetTopParser.parse(FacetRequest.java:416)
at
org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:125)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:251)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:617)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1091)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:668)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1521)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1478)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Unknown Source)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/FieldCache-error-for-multivalued-fields-in-json-facets-tp4216995p4230304.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How can I get a monotonically increasing field value for docs?

2015-09-21 Thread Gili Nachum

Thanks for the indepth explanation!

The secondary sort by uuid would allow me to read a series of docs with
identical time over multiple batches by specifying filtering
time>timeOnLastReadDoc or (time=timeOnLastReadDoc and
uuid>uuidOnLastReaDoc) which essentially creates a unique sorted value to
track progress over.
On Sep 21, 2015 19:56, "Shawn Heisey"  wrote:

> On 9/21/2015 9:01 AM, Gili Nachum wrote:
> > TimestampUpdateProcessorFactory takes place only on the leader shard, or
> on
> > each shard replica?
> > if on each replica then I would get different values on each replica.
> >
> > My alternative would be to perform secondary sort on a UUID to ensure
> order.
>
> If the update chain is configured properly, it runs on the leader, so
> all replicas get the same timestamp.
>
> Without SolrCloud, the way to create an "indexed at" time field is in
> the schema -- specify a default value of NOW on the field definition and
> don't send the field when indexing.  The old master/slave replication
> copies the actual index contents, so the indexed values in all replicas
> are the same.
>
> The problem with NOW in the schema when running SolrCloud is that each
> replica indexes the document independently, so each replica can have a
> different timestamp.  This is why the timestamp update processor exists
> -- to set the timestamp to a specific value before the document is
> duplicated to each replica, eliminating the problem.
>
> FYI, secondary sort parameters affect the order when the primary sort
> field is identical between two documents.  It may not do what you are
> intending because of that.
>
> Thanks,
> Shawn
>
>

Re: solr4.7: leader core does not elected to other active core after sorl OS shutdown, known issue?

2015-09-21 Thread Gili Nachum

Happens to us too. Solr 4.7.2
On Sep 21, 2015 20:42, "Jeff Wu"  wrote:

> Hi Shai, still the same question: other peer cores which they are active
> did not claim to be leader after a long time.  However, some of the peer
> cores claimed to be leaders at earlier time when server stopping. That's
> inconsistent results
>
> 2015-09-21 10:52 GMT-04:00 Shai Erera :
>
> > I don't think the process Shalin describes applies to clusterstate.json.
> > That JSON object reflects the status Solr "knows" about, or "last known
> > status". When Solr is properly shutdown, I believe those attributes are
> > cleared from clusterstate.json, as well the leaders give up their lease.
> >
> > However, when Solr is killed, it takes ZK the 30 seconds or so timeout to
> > kill the ephemeral node and release the leader lease. ZK is unaware of
> > Solr's clusterstate.json and cannot update the 'leader' property to
> false.
> > It simply releases the lease, so that other cores may claim it.
> >
> > Perhaps that explains the confusion?
> >
> > Shai
> >
> > On Mon, Sep 21, 2015 at 4:36 PM, Jeff Wu  wrote:
> >
> > > Hi Shalin,  thank you for the response.
> > >
> > > We waited longer enough than the ZK session timeout time, and it still
> > did
> > > not kick off any leader election for these "remained down-leader"
> cores.
> > > That's the question I'm actually asking.
> > >
> > > Our test scenario:
> > >
> > > Each solr server has 64 cores, and they are all active, and all leader
> > > cores.
> > > Shutdown the linux OS.
> > > Monitor clusterstate.json over ZK, after enough ZK session timeout
> value.
> > > We noticed some cores has leader election happened. But still saw some
> > down
> > > cores remains leader.
> > >
> > > 2015-09-21 9:15 GMT-04:00 Shalin Shekhar Mangar <
> shalinman...@gmail.com
> > >:
> > >
> > > > Hi Jeff,
> > > >
> > > > The leader election relies on ephemeral nodes in Zookeeper to detect
> > > > when leader or other nodes have gone down (abruptly). These ephemeral
> > > > nodes are automatically deleted by ZooKeeper after the ZK session
> > > > timeout which is by default 30 seconds. So if you kill a node then it
> > > > can take up to 30 seconds for the cluster to detect it and start a
> new
> > > > leader election. This won't be necessary during a graceful shutdown
> > > > because on shutdown the node will give up leader position so that a
> > > > new one can be elected. You could tune the zk session timeout to a
> > > > lower value but then it makes the cluster more sensitive to GC pauses
> > > > which can also trigger new leader elections.
> > > >
> > > > On Mon, Sep 21, 2015 at 5:55 PM, Jeff Wu  wrote:
> > > > > Our environment still run with Solr4.7. Recently we noticed in a
> > test.
> > > > When
> > > > > we stopped 1 solr server(solr02, which did OS shutdown), all the
> > cores
> > > of
> > > > > solr02 are shown as "down", but remains a few cores still as
> leaders.
> > > > After
> > > > > that, we quickly seeing all other servers are still sending
> requests
> > to
> > > > > that down solr server, and therefore we saw a lot of TCP waiting
> > > threads
> > > > in
> > > > > thread pool of other solr servers since solr02 already down.
> > > > >
> > > > > "shard53":{
> > > > > "range":"2666-2998",
> > > > > "state":"active",
> > > > > "replicas":{
> > > > >   "core_node102":{
> > > > > "state":"down",
> > > > > "base_url":"https://solr02.myhost/solr;,
> > > > > "core":"collection2_shard53_replica1",
> > > > > "node_name":"https://solr02.myhost_solr;,
> > > > > "leader":"true"},
> > > > >   "core_node104":{
> > > > > "state":"active",
> > > > > "base_url":"https://solr04.myhost/solr;,
> > > > > "core":"collection2_shard53_replica2",
> > > > > "node_name":"https://solr04.myhost/solr_solr"}}},
> > > > >
> > > > > Is this something known bug in 4.7 and late on fixed? Any reference
> > > JIRA
> > > > we
> > > > > can study about?  If the solr service is stopped gracefully, we can
> > see
> > > > > leader core election happens and switched to other active core. But
> > if
> > > we
> > > > > just directly shutdown a Solr OS, we can reproduce in our
> environment
> > > > that
> > > > > some "Down" cores remains "leader" at ZK clusterstate.json
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Shalin Shekhar Mangar.
> > > >
> > >
> >
>
>
>
> --
> Jeff Wu
> ---
> CSDL Beijing, China
>

Re: write.lock

2015-09-21 Thread Mikhail Khludnev

Both of these guys below try to write spell index into the same dir. Don't they?

To make it clear, it's not possible so far.

 
  solr.IndexBasedSpellChecker
  /localapps/dev/EventLog/index


   
 solr.FileBasedSpellChecker
 /localapps/dev/EventLog/index

Also, can you make sure that this path doesn't lead to main index dir.


On Mon, Sep 21, 2015 at 5:13 PM, Mark Fenbers  wrote:

> A snippet of my solrconfig.xml is attached.  The snippet only contains the
> Spell checking sections (for brevity) which should be sufficient for you to
> see all the pertinent info you seek.
>
> Thanks!
> Mark
>
>
> On 9/19/2015 3:29 AM, Mikhail Khludnev wrote:
>
>> Mark,
>>
>> What's your solconfig.xml?
>>
>> On Sat, Sep 19, 2015 at 12:34 AM, Mark Fenbers 
>> wrote:
>>
>> Greetings,
>>>
>>> Whenever I try to build my spellcheck index
>>> (params.set("spellcheck.build", true); or put a check in the
>>> spellcheck.build box in the web interface) I get the following
>>> stacktrace.
>>> Removing the write.lock file does no good.  The message comes right back
>>> anyway.  I read in a post that increasing writeLockTimeout would help.
>>> It
>>> did not help for me even increasing it to 20,000 msec.  If I don't build,
>>> then my resultset count is always 0, i.e., empty results.  What could be
>>> causing this?
>>>
>>> Mark
>>>
>>>
>>
>>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: ctargett commented on http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html

2015-09-21 Thread Cassandra Targett

Hey folks,

I'm doing some experiments with other formats for the Ref Guide and playing
around with options for comments. I didn't realize this old experiment from
https://issues.apache.org/jira/browse/SOLR-4889 would send email - I'm
talking to Steve Rowe to see if we can get that disabled.

Cassandra

On Mon, Sep 21, 2015 at 2:06 PM,  wrote:

> Hello,
> ctargett has commented on
> http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html
> .
> You can find the comment here:
>
> http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html#comment_4535
> Please note that if the comment contains a hyperlink, it must be
> approved
> before it is shown on the site.
>
> Below is the reply that was posted:
> 
> This is a test of the comments system.
> 
>
> With regards,
> Apache Solr Cwiki.
>
> You are receiving this email because you have subscribed to changes
> for the solrcwiki site.
> To stop receiving these emails, unsubscribe from the mailing list that
> is providing these notifications.
>
>

Re: Zero Query results

2015-09-21 Thread Erick Erickson

Screen captures generally get filtered out by the Apache e-mail, it didn't come
through.

But this makes no sense. The text_en field type you pasted should not
be having the problems you're talking about.

So if you add debug=true, you should be seeing your "Wednesday" query
going against your field. If you just type q=Wednesday, then the
default field is used, the "df" parameter in the request handler
you're using.

Best,
Erick

On Mon, Sep 21, 2015 at 12:57 PM, Mark Fenbers  wrote:
> You were right about finding only the Wednesday occurrences at the beginning
> of the line.  But attached (if it works) is a screen capture of my admin UI.
> But unlike your suspicion, the index text is being parsed properly, it
> appears.  So I'm uncertain where this leads me.
>
> Also attached is the pertinent schema.xml snippet you asked for.
>
> The logtext column in my table contains merely keyboarded text, with the
> infrequent exception that I add a \uFFFC as a placeholder for images.  So,
> should I be using something besides text_en as the fieldType?
>
> Thanks,
> Mark
>
> On 9/21/2015 12:12 PM, Erick Erickson wrote:
>>
>> bq: However, I discovered that if I search on "Wednesday*" (trailing
>> asterisk), then I get all the results containing Wednesday that I'm
>> looking for!
>>
>> This almost always means you're not searching on the field you think
>> you're searching on and/or the field isn't being analyzed as you think
>> (i.e. the fieldType isn't what you expect). If you're really searching
>> on a fieldType of text_en (and you haven't changed the definition),
>> then there's something very weird here. FieldTypes are totally
>> mutable, they are composed of various analysis chains that you (or
>> someone else) can freely alter, so seeing the  definition that
>> references a type="text_en" is suggestive but not definitive.
>>
>> I'm going to further guess that when you search on "Wednesday*", all
>> the matches are at the beginning of the line, and you find docs where
>> the field has "Wednesday, September" but not "The party was on
>> Wednesday".
>>
>> So let's see the  associated with the logtext field. Plus,
>> the results of adding =true to the query.
>>
>> But you can get a lot of info a lot faster if you go to the admin UI
>> screen, select the proper core from the drop-down on the left sied and
>> go to the "analysis" section. Pick the field (or field type), enter
>> some text and hit analyze (or uncheck the "verbose" box, that's
>> largely uninteresting info at this level). That'll show you exactly
>> how the input document is parsed, exactly how the query is parsed etc.
>> And be sure to enter something like
>> "september first was a Wednesday" in the left-hand (index) box, then
>> just "Wednesday" in the right hand (query) side. My bet: You'll see on
>> the index side that the input is not broken up, not transformed, etc.
>>
>> Best,
>> Erick
>>
>

ctargett commented on http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html

2015-09-21 Thread no-reply

Hello,
ctargett has commented on 
http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html. 
You can find the comment here:

http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html#comment_4535
Please note that if the comment contains a hyperlink, it must be approved
before it is shown on the site.

Below is the reply that was posted:

This is a test of the comments system.


With regards,
Apache Solr Cwiki.

You are receiving this email because you have subscribed to changes for the 
solrcwiki site.
To stop receiving these emails, unsubscribe from the mailing list that is 
providing these notifications.

Re: solr update dynamic field generates multiValued error

2015-09-21 Thread Aman Tandon

Hi Erick,

I am getting the same error because my dynamic field *_coordinate is
stored="true".
How can I get rid of this error?

And I have to use the atomic update. Please help!!

With Regards
Aman Tandon

On Tue, Aug 5, 2014 at 10:27 PM, Franco Giacosa  wrote:

> Hey Erick, i think that you were right, there was a mix in the schemas and
> that was generating the error on some of the documents.
>
> Thanks for the help guys!
>
>
> 2014-08-05 1:28 GMT-03:00 Erick Erickson :
>
> > Hmmm, I jus tried this with a 4.x build and I can update the document
> > multiple times without a problem. I just indexed the standard exampledocs
> > and then updated a doc like this (vidcard.xml was the base):
> >
> > 
> > 
> >   EN7800GTX/2DHTV/256M
> >
> >   eoe changed this puppy
> > 
> >   
> > 
> >
> > I'm not getting any multiple values in the _coordinate fields. However, I
> > _do_ get the error if my dynamic *_coordinate field is set to
> > stored="true".
> >
> > Did you perhaps change this at some point? Whenever I change the schema,
> I
> > try to 'rm -rf solr/collection/data' just to be sure I've purged all
> traces
> > of the former schema definition.
> >
> > Best,
> > Erick
> >
> >
> > On Mon, Aug 4, 2014 at 7:04 PM, Franco Giacosa 
> wrote:
> >
> > > No, they are not declarad explicitly.
> > >
> > > This is how they are created:
> > >
> > > 
> > >
> > >  > >  stored="false"/>
> > >
> > >  > > subFieldSuffix="_coordinate"/>
> > >
> > >
> > >
> > >
> > > 2014-08-04 22:28 GMT-03:00 Michael Ryan :
> > >
> > > > Are the latLong_0_coordinate and latLong_1_coordinate fields
> populated
> > > > using copyField? If so, this sounds like it could be
> > > > https://issues.apache.org/jira/browse/SOLR-3502.
> > > >
> > > > -Michael
> > > >
> > > > -Original Message-
> > > > From: Franco Giacosa [mailto:fgiac...@gmail.com]
> > > > Sent: Monday, August 04, 2014 9:05 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: solr update dynamic field generates multiValued error
> > > >
> > > > Hello everyone, this is my first time posting a question, so forgive
> me
> > > if
> > > > i'm missing something.
> > > >
> > > > This is my problem:
> > > >
> > > > I have a schema.xml that has the following latLong information
> > > >
> > > > The dynamicField generates 2 dynamic fields that have the lat and the
> > > long
> > > > (latLong_0_coordinate and latLong_1_coordinate)
> > > >
> > > > So for example a document will have
> > > >
> > > > "latLong_0_coordinate": 40.4114, "latLong_1_coordinate": -74.1031,
> > > > "latLong": "40.4114,-74.1031",
> > > >
> > > > Now when I try to update a document (i don't update the latLong
> field.
> > I
> > > > just update other parts of the document using atomic update) solr
> > > > re-creates the dynamicField and adds the same value again, like its
> > using
> > > > add instead of set. So when i do an update the fields of the doc look
> > > like
> > > > this
> > > >
> > > > "latLong_0_coordinate": [40.4114,40.4114] "latLong_1_coordinate":
> > > > [-74.1031,-74.1031] "latLong": "40.4114,-74.1031",
> > > >
> > > > So the dynamicFields now have 2 values, so the next time that I want
> to
> > > > update the document a schema error is throw because im trying to
> store
> > a
> > > > collection into a none multivalued field.
> > > >
> > > >
> > > > Thanks in advanced.
> > > >
> > >
> >
>

Re: solr update dynamic field generates multiValued error

2015-09-21 Thread Aman Tandon

Error is



40028ERROR:
[doc=9474144846] multiple values encountered for non multiValued field
latlon_0_coordinate: [11.0183, 11.0183]400


And my configuration is


   

 


  

 how you know it is because of stored="true"?

As Erick replied in the last mail thread,
I'm not getting any multiple values in the _coordinate fields. However, I
_do_ get the error if my dynamic *_coordinate field is set to stored="true".

And stored="true" is mandatory for using the atomic updates.

With Regards
Aman Tandon

On Mon, Sep 21, 2015 at 2:22 PM, Upayavira  wrote:

> Can you show the error you are getting, and how you know it is because
> of stored="true"?
>
> Upayavira
>
> On Mon, Sep 21, 2015, at 09:30 AM, Aman Tandon wrote:
> > Hi Erick,
> >
> > I am getting the same error because my dynamic field *_coordinate is
> > stored="true".
> > How can I get rid of this error?
> >
> > And I have to use the atomic update. Please help!!
> >
> > With Regards
> > Aman Tandon
> >
> > On Tue, Aug 5, 2014 at 10:27 PM, Franco Giacosa 
> > wrote:
> >
> > > Hey Erick, i think that you were right, there was a mix in the schemas
> and
> > > that was generating the error on some of the documents.
> > >
> > > Thanks for the help guys!
> > >
> > >
> > > 2014-08-05 1:28 GMT-03:00 Erick Erickson :
> > >
> > > > Hmmm, I jus tried this with a 4.x build and I can update the document
> > > > multiple times without a problem. I just indexed the standard
> exampledocs
> > > > and then updated a doc like this (vidcard.xml was the base):
> > > >
> > > > 
> > > > 
> > > >   EN7800GTX/2DHTV/256M
> > > >
> > > >   eoe changed this puppy
> > > > 
> > > >   
> > > > 
> > > >
> > > > I'm not getting any multiple values in the _coordinate fields.
> However, I
> > > > _do_ get the error if my dynamic *_coordinate field is set to
> > > > stored="true".
> > > >
> > > > Did you perhaps change this at some point? Whenever I change the
> schema,
> > > I
> > > > try to 'rm -rf solr/collection/data' just to be sure I've purged all
> > > traces
> > > > of the former schema definition.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > >
> > > > On Mon, Aug 4, 2014 at 7:04 PM, Franco Giacosa 
> > > wrote:
> > > >
> > > > > No, they are not declarad explicitly.
> > > > >
> > > > > This is how they are created:
> > > > >
> > > > >  stored="true"/>
> > > > >
> > > > >  > > > >  stored="false"/>
> > > > >
> > > > >  > > > > subFieldSuffix="_coordinate"/>
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2014-08-04 22:28 GMT-03:00 Michael Ryan :
> > > > >
> > > > > > Are the latLong_0_coordinate and latLong_1_coordinate fields
> > > populated
> > > > > > using copyField? If so, this sounds like it could be
> > > > > > https://issues.apache.org/jira/browse/SOLR-3502.
> > > > > >
> > > > > > -Michael
> > > > > >
> > > > > > -Original Message-
> > > > > > From: Franco Giacosa [mailto:fgiac...@gmail.com]
> > > > > > Sent: Monday, August 04, 2014 9:05 PM
> > > > > > To: solr-user@lucene.apache.org
> > > > > > Subject: solr update dynamic field generates multiValued error
> > > > > >
> > > > > > Hello everyone, this is my first time posting a question, so
> forgive
> > > me
> > > > > if
> > > > > > i'm missing something.
> > > > > >
> > > > > > This is my problem:
> > > > > >
> > > > > > I have a schema.xml that has the following latLong information
> > > > > >
> > > > > > The dynamicField generates 2 dynamic fields that have the lat
> and the
> > > > > long
> > > > > > (latLong_0_coordinate and latLong_1_coordinate)
> > > > > >
> > > > > > So for example a document will have
> > > > > >
> > > > > > "latLong_0_coordinate": 40.4114, "latLong_1_coordinate":
> -74.1031,
> > > > > > "latLong": "40.4114,-74.1031",
> > > > > >
> > > > > > Now when I try to update a document (i don't update the latLong
> > > field.
> > > > I
> > > > > > just update other parts of the document using atomic update) solr
> > > > > > re-creates the dynamicField and adds the same value again, like
> its
> > > > using
> > > > > > add instead of set. So when i do an update the fields of the doc
> look
> > > > > like
> > > > > > this
> > > > > >
> > > > > > "latLong_0_coordinate": [40.4114,40.4114] "latLong_1_coordinate":
> > > > > > [-74.1031,-74.1031] "latLong": "40.4114,-74.1031",
> > > > > >
> > > > > > So the dynamicFields now have 2 values, so the next time that I
> want
> > > to
> > > > > > update the document a schema error is throw because im trying to
> > > store
> > > > a
> > > > > > collection into a none multivalued field.
> > > > > >
> > > > > >
> > > > > > Thanks in advanced.
> > > > > >
> > > > >
> > > >
> > >
>

Re: Does more shards in core improve performance?

2015-09-21 Thread Zheng Lin Edwin Yeo

I'm not sure if that is because currently my machine is a normal PC and not
a server, but my CPU specification for each of the core is Intel(R)
Core(TM) i7-4910MQ CPU @ 2.90GHz.

It should probably be better when the real server which has a much better
specification comes, and I should be able to do the indexing in a lesser
time using the knowledge that I've learnt here.

Regards,
Edwin

On 21 September 2015 at 16:00, Toke Eskildsen 
wrote:

> On Mon, 2015-09-21 at 10:13 +0800, Zheng Lin Edwin Yeo wrote:
> > I didn't find any increase in indexing throughput by adding shards in the
> > same machine.
> >
> > However, I've managed to feed the index to Solr from more than one thread
> > at a time. It can take up to 3 threads without affecting the indexing
> > speed. Anything more than that, the CPU will hit 100%, and the indexing
> > speed in all the threads will be reduced.
>
> It is a bit surprising that the limit is 3 Threads on an 8 core machine,
> but I am happy to hear that your findings fit the overall theory.
>
>
> Thank you for the verification,
> Toke Eskildsen, State and University Library, Denmark
>
>
>

Re: solr update dynamic field generates multiValued error

2015-09-21 Thread Upayavira

Can you show the error you are getting, and how you know it is because
of stored="true"?

Upayavira

On Mon, Sep 21, 2015, at 09:30 AM, Aman Tandon wrote:
> Hi Erick,
> 
> I am getting the same error because my dynamic field *_coordinate is
> stored="true".
> How can I get rid of this error?
> 
> And I have to use the atomic update. Please help!!
> 
> With Regards
> Aman Tandon
> 
> On Tue, Aug 5, 2014 at 10:27 PM, Franco Giacosa 
> wrote:
> 
> > Hey Erick, i think that you were right, there was a mix in the schemas and
> > that was generating the error on some of the documents.
> >
> > Thanks for the help guys!
> >
> >
> > 2014-08-05 1:28 GMT-03:00 Erick Erickson :
> >
> > > Hmmm, I jus tried this with a 4.x build and I can update the document
> > > multiple times without a problem. I just indexed the standard exampledocs
> > > and then updated a doc like this (vidcard.xml was the base):
> > >
> > > 
> > > 
> > >   EN7800GTX/2DHTV/256M
> > >
> > >   eoe changed this puppy
> > > 
> > >   
> > > 
> > >
> > > I'm not getting any multiple values in the _coordinate fields. However, I
> > > _do_ get the error if my dynamic *_coordinate field is set to
> > > stored="true".
> > >
> > > Did you perhaps change this at some point? Whenever I change the schema,
> > I
> > > try to 'rm -rf solr/collection/data' just to be sure I've purged all
> > traces
> > > of the former schema definition.
> > >
> > > Best,
> > > Erick
> > >
> > >
> > > On Mon, Aug 4, 2014 at 7:04 PM, Franco Giacosa 
> > wrote:
> > >
> > > > No, they are not declarad explicitly.
> > > >
> > > > This is how they are created:
> > > >
> > > > 
> > > >
> > > >  > > >  stored="false"/>
> > > >
> > > >  > > > subFieldSuffix="_coordinate"/>
> > > >
> > > >
> > > >
> > > >
> > > > 2014-08-04 22:28 GMT-03:00 Michael Ryan :
> > > >
> > > > > Are the latLong_0_coordinate and latLong_1_coordinate fields
> > populated
> > > > > using copyField? If so, this sounds like it could be
> > > > > https://issues.apache.org/jira/browse/SOLR-3502.
> > > > >
> > > > > -Michael
> > > > >
> > > > > -Original Message-
> > > > > From: Franco Giacosa [mailto:fgiac...@gmail.com]
> > > > > Sent: Monday, August 04, 2014 9:05 PM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: solr update dynamic field generates multiValued error
> > > > >
> > > > > Hello everyone, this is my first time posting a question, so forgive
> > me
> > > > if
> > > > > i'm missing something.
> > > > >
> > > > > This is my problem:
> > > > >
> > > > > I have a schema.xml that has the following latLong information
> > > > >
> > > > > The dynamicField generates 2 dynamic fields that have the lat and the
> > > > long
> > > > > (latLong_0_coordinate and latLong_1_coordinate)
> > > > >
> > > > > So for example a document will have
> > > > >
> > > > > "latLong_0_coordinate": 40.4114, "latLong_1_coordinate": -74.1031,
> > > > > "latLong": "40.4114,-74.1031",
> > > > >
> > > > > Now when I try to update a document (i don't update the latLong
> > field.
> > > I
> > > > > just update other parts of the document using atomic update) solr
> > > > > re-creates the dynamicField and adds the same value again, like its
> > > using
> > > > > add instead of set. So when i do an update the fields of the doc look
> > > > like
> > > > > this
> > > > >
> > > > > "latLong_0_coordinate": [40.4114,40.4114] "latLong_1_coordinate":
> > > > > [-74.1031,-74.1031] "latLong": "40.4114,-74.1031",
> > > > >
> > > > > So the dynamicFields now have 2 values, so the next time that I want
> > to
> > > > > update the document a schema error is throw because im trying to
> > store
> > > a
> > > > > collection into a none multivalued field.
> > > > >
> > > > >
> > > > > Thanks in advanced.
> > > > >
> > > >
> > >
> >

Re: Questions regarding indexing JSON data

2015-09-21 Thread Upayavira

On Mon, Sep 21, 2015, at 02:53 AM, Kevin Vasko wrote:
> I am new to Apache Solr and have been struggling with indexing some JSON
> files.
> 
> I have several TB of twitter data in JSON format that I am having trouble
> posting/indexing. I am trying to use a schemaless schema so I don't have
> to add 200+ records fields manually.
> 
> 1.
> 
> The first issue is none of the records have '[' or ']' wrapped around the
> records. So it looks like this:
> 
>  { "created_at": "Sun Apr 19 23:45:45 + 2015","id":
>  5.899379634353e+17, "id_str": "589937963435302912",  mailing list>}
> 
> 
> Just to validate the schemaless portion was working I used a single
> "tweet" and trimmed it down to bare minimum. The brackets not being in
> the origian appears to be a problem as when I tried to process just a
> small portion of one record it requires me to wrap the row in a [ ] (I
> assume to make it an array) to index correctly.  Like the following:
> 
> [{ "created_at": "Sun Apr 19 23:45:45 + 2015","id":
> 5.899379634353e+17, "id_str": "589937963435302912", list>}]
> 
> Is there a way around this? I didn't want to preprocess the TB's of JSON
> data that is in this format to add '[', ',' and '[' around all of the
> data.
> 
> 2. 
> 
> The second issue is some of the fields have null values. 
> e.g. "in_reply_to_status_id": null,
> 
> I think I figured a way to resolve this by manually adding the field as a
> "strings" type but if I miss one it will kick the file out. Just wanted
> to see if there was something I could add to the schemaless configuration
> to have it pick up null fields as replace them as strings automatically?
> Or is there a better way to handle this?
> 
> 
> 3. 
> The last issue I think my most difficult issue. Which is dealing with
> "nested" or "children" fields in my JSON data.
> 
> The data looks like this. https://gist.github.com/gnip/764239. Is there
> anyways to index this information preferably automatically (schemaless
> method) without having to flatten all of my data?

1. Solr is designed to handle large amounts of content. You don't want
to be pushing documents one at a time, as you will be wasting huge
amounts of effort needlessly. Therefore, Solr assumes that when it
receives JSON, it will be in an array of documents. IIRC, when you post
an object {}, it will be considered a partial update instruction.

2. Don't rely upon the schemaless setup. Define your schema - you can't
actually live without one. Relying upon the data to work it out for you
is fraught with risk. Whether you define it via HTTP calls, or via
editing an XML file, is up to you. Just don't rely upon it correctly
guessing.

Also, when you have a 'null', the equivalent in Solr is to omit the
field. There is typically no concept in Solr for storing a null value.

3. Look at block joins, they may well help. But remember a Lucene index
is currently largely flat - you won't get anything like the versatility
out of it that you would from a relational database (in relation to
nested structures) as that isn't what it was designed for. Really,
you're gonna want to identify what you want OUT of your data, and then
identify a data structure that will allow you to achieve it. You cannot
assume that there is a standard way of doing it that will support every
use-case.

Upayavira

Re: How can I get a monotonically increasing field value for docs?

2015-09-21 Thread Upayavira

There's nothing to stop you creating your own
TimestampUpdateProcessorFactory, here's the entire source for it:

public class TimestampUpdateProcessorFactory
  extends AbstractDefaultValueUpdateProcessorFactory {

  @Override
  public UpdateRequestProcessor getInstance(SolrQueryRequest req, 
SolrQueryResponse rsp, 
UpdateRequestProcessor next
) {
return new DefaultValueUpdateProcessor(fieldName, next) {
  @Override
  public Object getDefaultValue() { 
return SolrRequestInfo.getRequestInfo().getNOW();
  }
};
  }
}

Effectively, all it does is return the value of NOW according to the
request, as the default value.

You could construct that on a per invocation basis, using
System.getMillis() or whatever.

Upayavira

On Mon, Sep 21, 2015, at 07:34 AM, Gili Nachum wrote:
> I've implemented a custom solr2solr ongoing unidirectional replication
> mechanism.
> 
> A Replicator (acting as solrJ client), crawls documents from SolrCloud1
> and
> writes them to SolrCloud2 in batches.
> The replicator crawl logic is to read documents with a time
> greater/equale
> to the time of the last replicated document.
> Whenever a document is added/updated, I auto updated a a tdate field
> "last_updated_in_solr" using TimestampUpdateProcessorFactory.
> 
> *My problem: *When a client indexes a batch of 100 documents, all 100
> docs
> have the same "last_updated_in_solr" value. This makes my ongoing
> replication check for new documents to replicate much more complex than
> if
> the time value was unique.
> 
> 1. Can I use some other processor to generate increasing unique values?
> 2. Can I use the internal _version_ field for this? is it guaranteed to
> be
> monotonically increasing for the entire collection or only per document,
> with each add/update?
> Any other options?
> 
> Schema.xml:
>  stored="true" multiValued="false"/>
> 
> solrconfig.xml:
> 
>
>last_updated_in_solr
>
>
>
> 
> 
> I know there's work for a build-in replication mechanism, but it's not
> yet
> released.
> Using Solr 4.7.2.

58 matches

Mail list logo