Tweaking boosts for more search results variety

2013-09-04 Thread Sai Gadde
Our index is aggregated content from various sites on the web. We want good
user experience by showing multiple sites in the search results. In our
setup we are seeing most of the results from same site on the top.

Here is some information regarding queries and schema
site - String field. We have about 1000 sites in index
sitetype - String field.  we have 3 site types
omitNorms="true" for both the fields

Doc count varies largely based on site and sitetype by a factor of 10 -
1000 times
Total index size is about 5 million docs.
Solr Version: 4.0

In our queries we have a fixed and preferential boost for certain sites.
sitetype has different and fixed boosts for 3 possible values. We turned
off Inverse Document Frequency (IDF) for these boosts to work properly.
Other text fields are boosted based on search keywords only.

With this setup we often see a bunch of hits from a single site followed by
next etc.,
Is there any solution to see results from variety of sites and still keep
the preferential boosts in place?


Re: unknown _stream_source_info while indexing rich doc in solr

2013-09-04 Thread Nutan
yes sir i did restart the tomcat.


On Wed, Sep 4, 2013 at 6:27 PM, Jack Krupansky-2 [via Lucene] <
ml-node+s472066n4088181...@n3.nabble.com> wrote:

> Did you restart Solr after editing config and schema?
>
> -- Jack Krupansky
>
> -Original Message-
> From: Nutan
> Sent: Wednesday, September 04, 2013 3:07 AM
> To: [hidden email] 
> Subject: unknown _stream_source_info while indexing rich doc in solr
>
> i am using solr4.2 on windows7
> my schema is:
>  required="true"/>
>  multiValued="true"/>
>  multiValued="false"/>
>  multiValued="false"/>
>  multiValued="false"/>
>  multiValued="false"/>
>  stored="true" multiValued="false"/>
> 
>
> solrconfig.xml :
> 
> 
> contents
> true
> ignored_
> true
> 
> 
>
> when i execute:
> curl "http://localhost:8080/solr/update/extract?literal.id=1&commit=true";
> -F "myfile=@abc.txt"
>
> i get error:unknown field ignored_stream_
> source_info.
>
> i referred solr cookbook3.1 and solrcookbook4 but error is not resolved
> please help me.
>
>
>
>
> --
> View this message in context:
>
> http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136p4088181.html
>  To unsubscribe from unknown _stream_source_info while indexing rich doc
> in solr, click 
> here
> .
> NAML
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136p4088295.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Invalid Version when slave node pull replication from master node

2013-09-04 Thread YouPeng Yang
Hi all
   I solve the problem by add the coreName explicitly according to
http://wiki.apache.org/solr/SolrReplication#Replicating_solrconfig.xml.

   But I want to make sure about that is it necessary to set the coreName
explicitly. Is there any SolrJ API to pull the replication on the slave
node from the master node?


regards



2013/9/5 YouPeng Yang 

> Hi again
>
>   I'm  using Solr4.4.
>
>
> 2013/9/5 YouPeng Yang 
>
>> HI solrusers
>>
>>I'm testing the replication within SolrCloud .
>>I just uncomment the replication section separately on the master and
>> slave node.
>>The replication section setting on the  master node:
>> 
>>  commit
>>  startup
>>  schema.xml,stopwords.txt
>>
>>  and on the slave node:
>>   
>>  http://10.7.23.124:8080/solr/#/
>>  00:00:50
>>
>>
>>After startup, an Error comes out on the slave node :
>> 80110110 [snapPuller-70-thread-1] ERROR
>> org.apache.solr.handler.SnapPuller  ?.Master at:
>> http://10.7.23.124:8080/solr/#/ is not available. Index fetch failed.
>> Exception: Invalid version (expected 2, but 60) or the data in not in
>> 'javabin' format
>>
>>
>>  Could anyone help me to solve the problem ?
>>
>>
>> regards
>>
>>
>>
>>
>


Re: Invalid Version when slave node pull replication from master node

2013-09-04 Thread YouPeng Yang
Hi again

  I'm  using Solr4.4.


2013/9/5 YouPeng Yang 

> HI solrusers
>
>I'm testing the replication within SolrCloud .
>I just uncomment the replication section separately on the master and
> slave node.
>The replication section setting on the  master node:
> 
>  commit
>  startup
>  schema.xml,stopwords.txt
>
>  and on the slave node:
>   
>  http://10.7.23.124:8080/solr/#/
>  00:00:50
>
>
>After startup, an Error comes out on the slave node :
> 80110110 [snapPuller-70-thread-1] ERROR
> org.apache.solr.handler.SnapPuller  ?.Master at:
> http://10.7.23.124:8080/solr/#/ is not available. Index fetch failed.
> Exception: Invalid version (expected 2, but 60) or the data in not in
> 'javabin' format
>
>
>  Could anyone help me to solve the problem ?
>
>
> regards
>
>
>
>


Invalid Version when slave node pull replication from master node

2013-09-04 Thread YouPeng Yang
HI solrusers

   I'm testing the replication within SolrCloud .
   I just uncomment the replication section separately on the master and
slave node.
   The replication section setting on the  master node:

 commit
 startup
 schema.xml,stopwords.txt
   
 and on the slave node:
  
 http://10.7.23.124:8080/solr/#/
 00:00:50
   

   After startup, an Error comes out on the slave node :
80110110 [snapPuller-70-thread-1] ERROR org.apache.solr.handler.SnapPuller
?.Master at: http://10.7.23.124:8080/solr/#/ is not available. Index fetch
failed. Exception: Invalid version (expected 2, but 60) or the data in not
in 'javabin' format


 Could anyone help me to solve the problem ?


regards


Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Tim Vaillancourt
Thanks so much for the explanation Mark, I owe you one (many)!

We have this on our high TPS cluster and will run it through it's paces
tomorrow. I'll provide any feedback I can, more soon! :D

Cheers,

Tim


Re: Little XsltResponseWriter documentation bug (Attn: Wiki Admin)

2013-09-04 Thread Dmitri Popov
Upayavira,

I could edit that page myself, but need to be confirmed human according to
http://wiki.apache.org/solr/FrontPage#How_to_edit_this_Wiki

My wiki account name is 'pin' just in case.

On Wed, Sep 4, 2013 at 5:27 PM, Upayavira  wrote:

> It's a wiki. Can't you correct it?
>
> Upayavira
>
> On Wed, Sep 4, 2013, at 08:25 PM, Dmitri Popov wrote:
> > Hi,
> >
> > http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF
> > too) become out of date:
> >
> > In configuration section
> >
> >  >   name="xslt"
> >   class="org.apache.solr.request.XSLTResponseWriter">
> >   5
> > 
> >
> > class name
> >
> > org.apache.solr.request.XSLTResponseWriter
> >
> > should be replaced by
> >
> > org.apache.solr.response.XSLTResponseWriter
> >
> > Otherwise ClassNotFoundException happens. Change is result of
> > https://issues.apache.org/jira/browse/SOLR-1602 as far as I see.
> >
> > Apparently can't update that page myself, please could someone else do
> > that?
> >
> > Thanks!
>


Re: Little XsltResponseWriter documentation bug (Attn: Wiki Admin)

2013-09-04 Thread Upayavira
It's a wiki. Can't you correct it?

Upayavira

On Wed, Sep 4, 2013, at 08:25 PM, Dmitri Popov wrote:
> Hi,
> 
> http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF
> too) become out of date:
> 
> In configuration section
> 
>name="xslt"
>   class="org.apache.solr.request.XSLTResponseWriter">
>   5
> 
> 
> class name
> 
> org.apache.solr.request.XSLTResponseWriter
> 
> should be replaced by
> 
> org.apache.solr.response.XSLTResponseWriter
> 
> Otherwise ClassNotFoundException happens. Change is result of
> https://issues.apache.org/jira/browse/SOLR-1602 as far as I see.
> 
> Apparently can't update that page myself, please could someone else do
> that?
> 
> Thanks!


RE: Solr highlighting fragment issue

2013-09-04 Thread Bryan Loofbourrow
>> I’m having some  issues with Solr search results (using Solr 1.4 ) . I
have enabled highlighting of searched text (hl=true) and set the fragment
size as 500 (hl.fragsize=500) in the search query.

Below is the (screen shot) results shown when I searched for the term
‘grandfather’ (2 results are displayed) .

Now I have couple of problems in this.

1.   In the search results the keyword is appearing inconsistently
towards the start/end of the text. I’d like to control the number of
characters appearing before and after the keyword match (highlighted term).
More specifically I’d like to get the keyword match somewhere around the
middle of the resultant text.

2.   The total number of characters appearing in the search result is
never equals the fragment size I specified (500 characters). It varies in
greater extends (for example  408 or 520).

Please share your thoughts on achieving the above 2 results. <<

I can’t see your screenshot, but it doesn’t really matter.



If I remember correctly how this stuff works, I think you’re going to have
a challenge getting where you want to get. In your position, I would push
back on both of those requirements rather than try to solve the problem.



For (1), the issue is that, IIRC, the highlighter breaks up your documents
into fragments BEFORE it knows where the matches are. I’d think you’d have
to pretty seriously recast the algorithm to get the result you want.



For (2), it may well be that you could tune the fragmenter to get closer to
your desired number of characters, either writing your own, or using the
available regexes and whatnot. But getting an exact number of characters
does not seem reasonable, because I’m pretty sure that there is a
constraint that a matching term must appear in its entirety in one fragment
– and also that sometimes fragments are concatenated. Imagine, for example,
a matched phrase where the start of the phrase is in one fragment, and the
end is in another. Which goes back to the first point.



So if you absolutely must have both of these (and the second one is
strange, since it implies that your fragments will often start and end in
the middles of words), then I guess you would need to rewrite the
fragmenting algorithm to drive fragmenting from the matches.



-- Bryan


Little XsltResponseWriter documentation bug (Attn: Wiki Admin)

2013-09-04 Thread Dmitri Popov
Hi,

http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF
too) become out of date:

In configuration section


  5


class name

org.apache.solr.request.XSLTResponseWriter

should be replaced by

org.apache.solr.response.XSLTResponseWriter

Otherwise ClassNotFoundException happens. Change is result of
https://issues.apache.org/jira/browse/SOLR-1602 as far as I see.

Apparently can't update that page myself, please could someone else do that?

Thanks!


Re: Numeric fields and payload

2013-09-04 Thread PETER LENAHAN
Chris Hostetter  fucit.org> writes:

> 
> 
> : is it possible to store (text) payload to numeric fields (class 
> : solr.TrieDoubleField)?  My goal is to store measure units to numeric 
> : features - e.g. '1.5 cm' - and to use faceted search with these fields. 
> : But the field type doesn't allow analyzers to add the payload data. I 
> : want to avoid database access to load the units. I'm using Solr 4.2 .
> 
> I'm not sure if it's possible to add payloads to Trie fields, but even if 
> there is i don't think you really want that for your usecase -- i think it 
> would make a lot more sense to normalize your units so you do consistent 
> sorting, range queries, and faceting on the values regardless of wether 
> it's 100cm or 1000mm or 1m.
> 
> -Hoss
> 
> 

Hoss,  What you suggest may be fine for specific units. But for monetary 
values with formatting it is not realistic. $10,000.00 would require 
formatting the number to display it.  It would be much easier to store the 
string as a payload with the formatted value.


Peter Lenahan



RE: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Markus Jelsma
Hi Mark,

Got an issue to watch?

Thanks,
Markus
 
-Original message-
> From:Mark Miller 
> Sent: Wednesday 4th September 2013 16:55
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud 4.x hangs under high update volume
> 
> I'm going to try and fix the root cause for 4.5 - I've suspected what it is 
> since early this year, but it's never personally been an issue, so it's 
> rolled along for a long time. 
> 
> Mark
> 
> Sent from my iPhone
> 
> On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt  wrote:
> 
> > Hey guys,
> > 
> > I am looking into an issue we've been having with SolrCloud since the
> > beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0
> > yet). I've noticed other users with this same issue, so I'd really like to
> > get to the bottom of it.
> > 
> > Under a very, very high rate of updates (2000+/sec), after 1-12 hours we
> > see stalled transactions that snowball to consume all Jetty threads in the
> > JVM. This eventually causes the JVM to hang with most threads waiting on
> > the condition/stack provided at the bottom of this message. At this point
> > SolrCloud instances then start to see their neighbors (who also have all
> > threads hung) as down w/"Connection Refused", and the shards become "down"
> > in state. Sometimes a node or two survives and just returns 503s "no server
> > hosting shard" errors.
> > 
> > As a workaround/experiment, we have tuned the number of threads sending
> > updates to Solr, as well as the batch size (we batch updates from client ->
> > solr), and the Soft/Hard autoCommits, all to no avail. Turning off
> > Client-to-Solr batching (1 update = 1 call to Solr), which also did not
> > help. Certain combinations of update threads and batch sizes seem to
> > mask/help the problem, but not resolve it entirely.
> > 
> > Our current environment is the following:
> > - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
> > - 3 x Zookeeper instances, external Java 7 JVM.
> > - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and
> > a replica of 1 shard).
> > - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good
> > day.
> > - 5000 max jetty threads (well above what we use when we are healthy),
> > Linux-user threads ulimit is 6000.
> > - Occurs under Jetty 8 or 9 (many versions).
> > - Occurs under Java 1.6 or 1.7 (several minor versions).
> > - Occurs under several JVM tunings.
> > - Everything seems to point to Solr itself, and not a Jetty or Java version
> > (I hope I'm wrong).
> > 
> > The stack trace that is holding up all my Jetty QTP threads is the
> > following, which seems to be waiting on a lock that I would very much like
> > to understand further:
> > 
> > "java.lang.Thread.State: WAITING (parking)
> >at sun.misc.Unsafe.park(Native Method)
> >- parking to wait for  <0x0007216e68d8> (a
> > java.util.concurrent.Semaphore$NonfairSync)
> >at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> >at
> > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> >at
> > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> >at
> > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> >at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
> >at
> > org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
> >at
> > org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
> >at
> > org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
> >at
> > org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
> >at
> > org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
> >at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
> >at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
> >at
> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
> >at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
> >at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
> >at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
> >at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
> >at
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
> >at
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
> >at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja

subindex

2013-09-04 Thread Peyman Faratin
Hi

Is there a way to build a new (smaller) index from an existing (larger) index 
where the smaller index contains a subset of the fields of the larger index? 

thank you

cleanup after OutOfMemoryError

2013-09-04 Thread Ryan McKinley
I have an application where I am calling DirectUpdateHandler2 directly with:

  update.addDoc(cmd);

This will sometimes hit:

java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:248)
at org.apache.lucene.store.DataOutput.writeString(DataOutput.java:234)
at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.writeField(CompressingStoredFieldsWriter.java:273)
at
org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:126)
at
org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65)
at
org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:264)
at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:283)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:212)
at voyager.index.zmq.IndexingRunner.apply(IndexingRunner.java:303)

and then a little while later:

auto commit error...:java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit
at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:549)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)


Is there anythign I can/should do to cleanup after the OOME?  At a minimum
I do not want any new requests using the same IndexWriter.  Should I use:


  catch(OutOfMemoryError ex) {

   update.getCommitTracker().cancelPendingCommit();
 update.newIndexWriter(false);
 ...

or perhaps 'true' for rollback?

Thanks
Ryan


Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Mark Miller
The 'lock' or semaphore was added to cap the number of threads that would be 
used. Previously, the number of threads in use could spike to many, many 
thousands on heavy updates. A limit on the number of outstanding requests was 
put in place to keep this from happening. Something like 16 * the number of 
hosts in the cluster.

I assume the deadlock comes from the fact that requests are of two kinds - 
forward to the leader and distrib updates from the leader to replicas. Forward 
to the leader actually waits for the leader to then distrib the updates to 
replicas before returning. I believe this is what can lead to deadlock. 

This is likely why the patch for the CloudSolrServer can help the situation - 
it removes the need to forward to the leader because it sends to the correct 
leader to begin with. Only useful if you are adding docs with CloudSolrServer 
though, and more like a workaround than a fix.

The patch uses a separate 'limiting' semaphore for the two cases.

- Mark

On Sep 4, 2013, at 10:22 AM, Tim Vaillancourt  wrote:

> Thanks guys! :)
> 
> Mark: this patch is much appreciated, I will try to test this shortly, 
> hopefully today.
> 
> For my curiosity/understanding, could someone explain to me quickly what 
> locks SolrCloud takes on updates? Was I on to something that more shards 
> decrease the chance for locking?
> 
> Secondly, I was wondering if someone could summarize what this patch 'fixes'? 
> I'm not too familiar with Java and the solr codebase (working on that though 
> :D).
> 
> Cheers,
> 
> Tim
> 
> 
> 
> On 4 September 2013 09:52, Mark Miller  wrote:
> There is an issue if I remember right, but I can't find it right now.
> 
> If anyone that has the problem could try this patch, that would be very
> helpful: http://pastebin.com/raw.php?i=aaRWwSGP
> 
> - Mark
> 
> 
> On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma 
> wrote:
> 
> > Hi Mark,
> >
> > Got an issue to watch?
> >
> > Thanks,
> > Markus
> >
> > -Original message-
> > > From:Mark Miller 
> > > Sent: Wednesday 4th September 2013 16:55
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: SolrCloud 4.x hangs under high update volume
> > >
> > > I'm going to try and fix the root cause for 4.5 - I've suspected what it
> > is since early this year, but it's never personally been an issue, so it's
> > rolled along for a long time.
> > >
> > > Mark
> > >
> > > Sent from my iPhone
> > >
> > > On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt 
> > wrote:
> > >
> > > > Hey guys,
> > > >
> > > > I am looking into an issue we've been having with SolrCloud since the
> > > > beginning of our testing, all the way from 4.1 to 4.3 (haven't tested
> > 4.4.0
> > > > yet). I've noticed other users with this same issue, so I'd really
> > like to
> > > > get to the bottom of it.
> > > >
> > > > Under a very, very high rate of updates (2000+/sec), after 1-12 hours
> > we
> > > > see stalled transactions that snowball to consume all Jetty threads in
> > the
> > > > JVM. This eventually causes the JVM to hang with most threads waiting
> > on
> > > > the condition/stack provided at the bottom of this message. At this
> > point
> > > > SolrCloud instances then start to see their neighbors (who also have
> > all
> > > > threads hung) as down w/"Connection Refused", and the shards become
> > "down"
> > > > in state. Sometimes a node or two survives and just returns 503s "no
> > server
> > > > hosting shard" errors.
> > > >
> > > > As a workaround/experiment, we have tuned the number of threads sending
> > > > updates to Solr, as well as the batch size (we batch updates from
> > client ->
> > > > solr), and the Soft/Hard autoCommits, all to no avail. Turning off
> > > > Client-to-Solr batching (1 update = 1 call to Solr), which also did not
> > > > help. Certain combinations of update threads and batch sizes seem to
> > > > mask/help the problem, but not resolve it entirely.
> > > >
> > > > Our current environment is the following:
> > > > - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
> > > > - 3 x Zookeeper instances, external Java 7 JVM.
> > > > - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard
> > and
> > > > a replica of 1 shard).
> > > > - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a
> > good
> > > > day.
> > > > - 5000 max jetty threads (well above what we use when we are healthy),
> > > > Linux-user threads ulimit is 6000.
> > > > - Occurs under Jetty 8 or 9 (many versions).
> > > > - Occurs under Java 1.6 or 1.7 (several minor versions).
> > > > - Occurs under several JVM tunings.
> > > > - Everything seems to point to Solr itself, and not a Jetty or Java
> > version
> > > > (I hope I'm wrong).
> > > >
> > > > The stack trace that is holding up all my Jetty QTP threads is the
> > > > following, which seems to be waiting on a lock that I would very much
> > like
> > > > to understand further:
> > > >
> > > > "java.lang.Thread.State: WAITING (parking)
> > > >at sun.misc.Unsafe.park(Native Me

Re: cleanup after OutOfMemoryError

2013-09-04 Thread Mark Miller
I don't know that there is any 'safe' thing you can do other than restart -
but if I were to try anything, I would use true for rollback.

- Mark


On Wed, Sep 4, 2013 at 9:44 AM, Ryan McKinley  wrote:

> I have an application where I am calling DirectUpdateHandler2 directly
> with:
>
>   update.addDoc(cmd);
>
> This will sometimes hit:
>
> java.lang.OutOfMemoryError: Java heap space
> at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:248)
> at org.apache.lucene.store.DataOutput.writeString(DataOutput.java:234)
> at
>
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.writeField(CompressingStoredFieldsWriter.java:273)
> at
>
> org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:126)
> at
>
> org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65)
> at
>
> org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:264)
> at
>
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:283)
> at
>
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
> at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
> at
>
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:212)
> at voyager.index.zmq.IndexingRunner.apply(IndexingRunner.java:303)
>
> and then a little while later:
>
> auto commit error...:java.lang.IllegalStateException: this writer hit an
> OutOfMemoryError; cannot commit
> at
>
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
> at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
> at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
> at
>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:549)
> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>
> Is there anythign I can/should do to cleanup after the OOME?  At a minimum
> I do not want any new requests using the same IndexWriter.  Should I use:
>
>
>   catch(OutOfMemoryError ex) {
>
>update.getCommitTracker().cancelPendingCommit();
>  update.newIndexWriter(false);
>  ...
>
> or perhaps 'true' for rollback?
>
> Thanks
> Ryan
>



-- 
- Mark


Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Tim Vaillancourt
Thanks guys! :)

Mark: this patch is much appreciated, I will try to test this shortly,
hopefully today.

For my curiosity/understanding, could someone explain to me quickly what
locks SolrCloud takes on updates? Was I on to something that more shards
decrease the chance for locking?

Secondly, I was wondering if someone could summarize what this patch
'fixes'? I'm not too familiar with Java and the solr codebase (working on
that though :D).

Cheers,

Tim



On 4 September 2013 09:52, Mark Miller  wrote:

> There is an issue if I remember right, but I can't find it right now.
>
> If anyone that has the problem could try this patch, that would be very
> helpful: http://pastebin.com/raw.php?i=aaRWwSGP
>
> - Mark
>
>
> On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma  >wrote:
>
> > Hi Mark,
> >
> > Got an issue to watch?
> >
> > Thanks,
> > Markus
> >
> > -Original message-
> > > From:Mark Miller 
> > > Sent: Wednesday 4th September 2013 16:55
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: SolrCloud 4.x hangs under high update volume
> > >
> > > I'm going to try and fix the root cause for 4.5 - I've suspected what
> it
> > is since early this year, but it's never personally been an issue, so
> it's
> > rolled along for a long time.
> > >
> > > Mark
> > >
> > > Sent from my iPhone
> > >
> > > On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt 
> > wrote:
> > >
> > > > Hey guys,
> > > >
> > > > I am looking into an issue we've been having with SolrCloud since the
> > > > beginning of our testing, all the way from 4.1 to 4.3 (haven't tested
> > 4.4.0
> > > > yet). I've noticed other users with this same issue, so I'd really
> > like to
> > > > get to the bottom of it.
> > > >
> > > > Under a very, very high rate of updates (2000+/sec), after 1-12 hours
> > we
> > > > see stalled transactions that snowball to consume all Jetty threads
> in
> > the
> > > > JVM. This eventually causes the JVM to hang with most threads waiting
> > on
> > > > the condition/stack provided at the bottom of this message. At this
> > point
> > > > SolrCloud instances then start to see their neighbors (who also have
> > all
> > > > threads hung) as down w/"Connection Refused", and the shards become
> > "down"
> > > > in state. Sometimes a node or two survives and just returns 503s "no
> > server
> > > > hosting shard" errors.
> > > >
> > > > As a workaround/experiment, we have tuned the number of threads
> sending
> > > > updates to Solr, as well as the batch size (we batch updates from
> > client ->
> > > > solr), and the Soft/Hard autoCommits, all to no avail. Turning off
> > > > Client-to-Solr batching (1 update = 1 call to Solr), which also did
> not
> > > > help. Certain combinations of update threads and batch sizes seem to
> > > > mask/help the problem, but not resolve it entirely.
> > > >
> > > > Our current environment is the following:
> > > > - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
> > > > - 3 x Zookeeper instances, external Java 7 JVM.
> > > > - 1 collection, 3 shards, 2 replicas (each node is a leader of 1
> shard
> > and
> > > > a replica of 1 shard).
> > > > - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a
> > good
> > > > day.
> > > > - 5000 max jetty threads (well above what we use when we are
> healthy),
> > > > Linux-user threads ulimit is 6000.
> > > > - Occurs under Jetty 8 or 9 (many versions).
> > > > - Occurs under Java 1.6 or 1.7 (several minor versions).
> > > > - Occurs under several JVM tunings.
> > > > - Everything seems to point to Solr itself, and not a Jetty or Java
> > version
> > > > (I hope I'm wrong).
> > > >
> > > > The stack trace that is holding up all my Jetty QTP threads is the
> > > > following, which seems to be waiting on a lock that I would very much
> > like
> > > > to understand further:
> > > >
> > > > "java.lang.Thread.State: WAITING (parking)
> > > >at sun.misc.Unsafe.park(Native Method)
> > > >- parking to wait for  <0x0007216e68d8> (a
> > > > java.util.concurrent.Semaphore$NonfairSync)
> > > >at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> > > >at
> > > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> > > >at
> > > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> > > >at
> > > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> > > >at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
> > > >at
> > > >
> >
> org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
> > > >at
> > > >
> >
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
> > > >at
> > > >
> >
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
> > > >at
> > > >
> >
> org.apache.solr.update.SolrCmdDist

Questions about Replication Factor on solrcloud

2013-09-04 Thread Lisandro Montaño
Hi all,

 

I’m currently working on deploying a solrcloud distribution in centos
machines and wanted to have more guidance about Replication Factor
configuration.

 

I have configured two servers with solrcloud over tomcat and a third server
as zookeeper. I have configured successfully and have one server with
collection1 available and the other with collection1_Shard1_Replica1.

 

My questions are:

 

-  Can I have 1 shard and 2 replicas on two machines? What are the
limitations or considerations to define this?

-  How does replica works? (there is not too much info about it)

-  When I import data on collection1 it works properly, but when I
do it in collection1_Shard1_Replica1 it fails. Is that an expected behavior?
(Maybe if I have a better definition of replica’s I will understand it
better)

 

 

Thanks in advance for your help and guidance.

 

Regards,

Lisandro Montano

 



Re: Solr Cloud hangs when replicating updates

2013-09-04 Thread Mark Miller
It would be great if you could give this patch a try:
http://pastebin.com/raw.php?i=aaRWwSGP

- Mark


On Wed, Sep 4, 2013 at 8:31 AM, Kevin Osborn  wrote:

> Thanks. If there is anything I can do to help you resolve this issue, let
> me know.
>
> -Kevin
>
>
> On Wed, Sep 4, 2013 at 7:51 AM, Mark Miller  wrote:
>
> > Ill look at fixing the root issue for 4.5. I've been putting it off for
> > way to long.
> >
> > Mark
> >
> > Sent from my iPhone
> >
> > On Sep 3, 2013, at 2:15 PM, Kevin Osborn  wrote:
> >
> > > I was having problems updating SolrCloud with a large batch of records.
> > The
> > > records are coming in bursts with lulls between updates.
> > >
> > > At first, I just tried large updates of 100,000 records at a time.
> > > Eventually, this caused Solr to hang. When hung, I can still query
> Solr.
> > > But I cannot do any deletes or other updates to the index.
> > >
> > > At first, my updates were going as SolrJ CSV posts. I have also tried
> > local
> > > file updates and had similar results. I finally slowed things down to
> > just
> > > use SolrJ's Update feature, which is basically just JavaBin. I am also
> > > sending over just 100 at a time in 10 threads. Again, it eventually
> hung.
> > >
> > > Sometimes, Solr hangs in the first couple of chunks. Other times, it
> > hangs
> > > right away.
> > >
> > > These are my commit settings:
> > >
> > > 
> > >   15000
> > >   5000
> > >   false
> > > 
> > > 
> > > 3
> > >   
> > >
> > > I have tried quite a few variations with the same results. I also tried
> > > various JVM settings with the same results. The only variable seems to
> be
> > > that reducing the cluster size from 2 to 1 is the only thing that
> helps.
> > >
> > > I also did a jstack trace. I did not see any explicit deadlocks, but I
> > did
> > > see quite a few threads in WAITING or TIMED_WAITING. It is typically
> > > something like this:
> > >
> > >  java.lang.Thread.State: WAITING (parking)
> > >at sun.misc.Unsafe.park(Native Method)
> > >- parking to wait for  <0x00074039a450> (a
> > > java.util.concurrent.Semaphore$NonfairSync)
> > >at
> > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> > >at
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> > >at
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> > >at
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> > >at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
> > >at
> > >
> >
> org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
> > >at
> > >
> >
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
> > >at
> > >
> >
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
> > >at
> > >
> >
> org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
> > >at
> > >
> >
> org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
> > >at
> > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
> > >at
> > >
> >
> org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
> > >at
> > >
> >
> org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
> > >at
> > >
> org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
> > >at
> > org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
> > >at
> > >
> >
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> > >at
> > >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> > >at
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> > >at
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> > >at
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> > >at
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> > >at
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> > >at
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> > >at
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > >at
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
> > >  

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Mark Miller
There is an issue if I remember right, but I can't find it right now.

If anyone that has the problem could try this patch, that would be very
helpful: http://pastebin.com/raw.php?i=aaRWwSGP

- Mark


On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma wrote:

> Hi Mark,
>
> Got an issue to watch?
>
> Thanks,
> Markus
>
> -Original message-
> > From:Mark Miller 
> > Sent: Wednesday 4th September 2013 16:55
> > To: solr-user@lucene.apache.org
> > Subject: Re: SolrCloud 4.x hangs under high update volume
> >
> > I'm going to try and fix the root cause for 4.5 - I've suspected what it
> is since early this year, but it's never personally been an issue, so it's
> rolled along for a long time.
> >
> > Mark
> >
> > Sent from my iPhone
> >
> > On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt 
> wrote:
> >
> > > Hey guys,
> > >
> > > I am looking into an issue we've been having with SolrCloud since the
> > > beginning of our testing, all the way from 4.1 to 4.3 (haven't tested
> 4.4.0
> > > yet). I've noticed other users with this same issue, so I'd really
> like to
> > > get to the bottom of it.
> > >
> > > Under a very, very high rate of updates (2000+/sec), after 1-12 hours
> we
> > > see stalled transactions that snowball to consume all Jetty threads in
> the
> > > JVM. This eventually causes the JVM to hang with most threads waiting
> on
> > > the condition/stack provided at the bottom of this message. At this
> point
> > > SolrCloud instances then start to see their neighbors (who also have
> all
> > > threads hung) as down w/"Connection Refused", and the shards become
> "down"
> > > in state. Sometimes a node or two survives and just returns 503s "no
> server
> > > hosting shard" errors.
> > >
> > > As a workaround/experiment, we have tuned the number of threads sending
> > > updates to Solr, as well as the batch size (we batch updates from
> client ->
> > > solr), and the Soft/Hard autoCommits, all to no avail. Turning off
> > > Client-to-Solr batching (1 update = 1 call to Solr), which also did not
> > > help. Certain combinations of update threads and batch sizes seem to
> > > mask/help the problem, but not resolve it entirely.
> > >
> > > Our current environment is the following:
> > > - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
> > > - 3 x Zookeeper instances, external Java 7 JVM.
> > > - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard
> and
> > > a replica of 1 shard).
> > > - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a
> good
> > > day.
> > > - 5000 max jetty threads (well above what we use when we are healthy),
> > > Linux-user threads ulimit is 6000.
> > > - Occurs under Jetty 8 or 9 (many versions).
> > > - Occurs under Java 1.6 or 1.7 (several minor versions).
> > > - Occurs under several JVM tunings.
> > > - Everything seems to point to Solr itself, and not a Jetty or Java
> version
> > > (I hope I'm wrong).
> > >
> > > The stack trace that is holding up all my Jetty QTP threads is the
> > > following, which seems to be waiting on a lock that I would very much
> like
> > > to understand further:
> > >
> > > "java.lang.Thread.State: WAITING (parking)
> > >at sun.misc.Unsafe.park(Native Method)
> > >- parking to wait for  <0x0007216e68d8> (a
> > > java.util.concurrent.Semaphore$NonfairSync)
> > >at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> > >at
> > >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> > >at
> > >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> > >at
> > >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> > >at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
> > >at
> > >
> org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
> > >at
> > >
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
> > >at
> > >
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
> > >at
> > >
> org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
> > >at
> > >
> org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
> > >at
> > >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
> > >at
> > >
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
> > >at
> > >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
> > >at
> > >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
> > >at
> > >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispa

Solr highlighting fragment issue

2013-09-04 Thread Sreehareesh Kaipravan Meethaleveetil
Hi,
I'm having some  issues with Solr search results (using Solr 1.4 ) . I have 
enabled highlighting of searched text (hl=true) and set the fragment size as 
500 (hl.fragsize=500) in the search query.
Below is the (screen shot) results shown when I searched for the term 
'grandfather' (2 results are displayed) .
Now I have couple of problems in this.

1.   In the search results the keyword is appearing inconsistently towards 
the start/end of the text. I'd like to control the number of characters 
appearing before and after the keyword match (highlighted term). More 
specifically I'd like to get the keyword match somewhere around the middle of 
the resultant text.

2.   The total number of characters appearing in the search result is never 
equals the fragment size I specified (500 characters). It varies in greater 
extends (for example  408 or 520).
Please share your thoughts on achieving the above 2 results.
[cid:image001.png@01CEA8D2.4FF025E0]
Thanks & Regards,
Sreehareesh KM


How to config SOLR server for spell check functionality

2013-09-04 Thread sebastian.manolescu
I want to implement spell check functionality offerd by solr using MySql
database, but I dont understand how.
Here the basic flow of what I want to do.

I have a simple inputText (in jsf) and if I type the word shwo the response
to OutputLabel should be show.

First of all I'm using the following tools and frameworks:

JBoss application server 6.1.
Eclipse
JPA
JSF(Primefaces)

Steps I've done until now:

Step 1: Download solr server from:
http://lucene.apache.org/solr/downloads.html Extract content.

Step 2: Add to Envoierment variable:

Variable name: solr.solr.home Variable value :
D:\JBOSS\solr-4.4.0\solr-4.4.0\example\solr --- where you have the solr
server

Step 3:

Open solr war and to solr.war\WEB-INF\web.xml add env-entry - (the easy way)

solr/home D:\JBOSS\solr-4.4.0\solr-4.4.0\example\solr java.lang.String

OR import project change and bulid war.

Step 4: Browser: localhost:8080/solr/

And the solr console appears.

Until now all works well.

I have found some usefull code (my opinion) that returns:

[collection1] webapp=/solr path=/spell
params={spellcheck=on&q=whatever&wt=javabin&qt=/spell&version=2&spellcheck.build=true}
hits=0 status=0 QTime=16

Here is the code that gives the result from above:

SolrServer solr;
try {
solr = new CommonsHttpSolrServer("http://localhost:8080/solr";);

ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/spell");
params.set("q", "whatever");
params.set("spellcheck", "on");
params.set("spellcheck.build", "true");

QueryResponse response = solr.query(params);
SpellCheckResponse spellCheckResponse =
response.getSpellCheckResponse();
if (!spellCheckResponse.isCorrectlySpelled()) {
for (Suggestion suggestion :
response.getSpellCheckResponse().getSuggestions()) {
   System.out.println("original token: " + suggestion.getToken() + "
- alternatives: " + suggestion.getAlternatives());
}
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Questions:

1.How do I make the database connection whit my DB and search the content to
see if there are any words that could match?
2.How do I make the configuration.(solr-config.xml,shema.xml...etc)?
3.How do I send a string from my view(xhtml) so that the solr server knows
what he looks for?

I read all the information about solr but it's still unclear:

Links:Main Page:
http://lucene.apache.org/solr/

Main Page tutorial: http://lucene.apache.org/solr/4_4_0/tutorial.html

Solr Wiki:
http://wiki.apache.org/solr/Solrj --- official solrj documentation
http://wiki.apache.org/solr/SpellCheckComponent

Solr config: http://wiki.apache.org/solr/SolrConfigXml
http://www.installationpage.com/solr/solr-configuration-tutorial-schema-solrconfig-xml/
http://wiki.apache.org/solr/SchemaXml

StackOverflow proof: Solr Did you mean (Spell check component)

Solr Database Integration:
http://www.slideshare.net/th0masr/integrating-the-solr-search-engine
http://www.cabotsolutions.com/2009/05/using-solr-lucene-for-full-text-search-with-mysql-db/

Solr Spell Check:
http://docs.lucidworks.com/display/solr/Spell+Checking
http://searchhub.org/2010/08/31/getting-started-spell-checking-with-apache-lucene-and-solr/
http://techiesinsight.blogspot.ro/2012/06/using-solr-spellchecker-from-java.html
http://blog.websolr.com/post/2748574298/spellcheck-with-solr-spellcheckcomponent
How to use SpellingResult class in SolrJ

I really need your help.Regards.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-config-SOLR-server-for-spell-check-functionality-tp4088163.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr performance against oracle

2013-09-04 Thread Toke Eskildsen
On Wed, 2013-09-04 at 14:06 +0200, Sergio Stateri wrote:
> I´m trying to change the data access in the company where I work from
> Oracle to Solr.

They work on different principles and fulfill different needs. Comparing
them by a performance oriented test are not likely to be usable point
for selecting between them. Start by describing your typical use cases
instead.

> Solr aways returns the data arround 150~200 ms (from localhost), but Oracle
> returns arround 20 ms (and Oracle server is in another company, I´m using
> dedicated link to access it).

200ms is suspiciously slow for a trivial lookup in 800,000 values. I am
sure we can bring that down to Oracle-time or better, but I do not think
it shows much.

> How can I tell to my managers that I´d like to use Solr?

Why would you like to use Solr?



Need help on Joining and sorting syntax and limitations between multiple documents in solr-4.4.0

2013-09-04 Thread Sukanta Dey
Hi Team,

In my project I am going to use Apache solr-4.4.0 version for searching. While 
doing that I need to join between multiple solr documents within the same core 
on one of the common field across the documents.
Though I successfully join the documents using solr-4.4.0 join syntax, it is 
returning me the expected result, but, since my next requirement is to sort the 
returned result on basis of the fields from the documents
Involved in join condition's "from" clause, which I was not able to get. Let me 
explain the problem in detail along with the files I am using ...


1)  Files being used :

a.   Picklist_1.xml

--



t1324838

7

956

130712901

Draft

Draoft





b.  Picklist_2.xml

---



t1324837

7

87749

130712901

New

Neuo





c.   AssetID_1.xml

---



t1324837

a180894808

1

true

2013-09-02T09:28:18Z

130713716

130712901





d.  AssetID_2.xml





 t1324838

 a171658357

1

130713716

2283961

2290309

7

7

13503796
15485964

38052

41133

130712901





2)  Requirement:



i. It needs to have a join  between the files using 
"def14227_picklist" field from AssetID_1.xml and AssetID_2.xml and 
"describedObjectId" field from Picklist_1.xml and Picklist_2.xml files.

ii.   After joining we need to have all the fields from the 
files AssetID_*.xml and "en","gr" fields from Picklist_*.xml files.

iii.  While joining we also sort the result based on the "en" 
field value.



3)  I was trying with "q={!join from=inner_id to=outer_id}zzz:vvv" syntax 
but no luck.

Any help/suggestion would be appreciated.

Thanks,
Sukanta Dey






Re: Solr Cloud hangs when replicating updates

2013-09-04 Thread Mark Miller
Ill look at fixing the root issue for 4.5. I've been putting it off for way to 
long. 

Mark 

Sent from my iPhone

On Sep 3, 2013, at 2:15 PM, Kevin Osborn  wrote:

> I was having problems updating SolrCloud with a large batch of records. The
> records are coming in bursts with lulls between updates.
> 
> At first, I just tried large updates of 100,000 records at a time.
> Eventually, this caused Solr to hang. When hung, I can still query Solr.
> But I cannot do any deletes or other updates to the index.
> 
> At first, my updates were going as SolrJ CSV posts. I have also tried local
> file updates and had similar results. I finally slowed things down to just
> use SolrJ's Update feature, which is basically just JavaBin. I am also
> sending over just 100 at a time in 10 threads. Again, it eventually hung.
> 
> Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs
> right away.
> 
> These are my commit settings:
> 
> 
>   15000
>   5000
>   false
> 
> 
> 3
>   
> 
> I have tried quite a few variations with the same results. I also tried
> various JVM settings with the same results. The only variable seems to be
> that reducing the cluster size from 2 to 1 is the only thing that helps.
> 
> I also did a jstack trace. I did not see any explicit deadlocks, but I did
> see quite a few threads in WAITING or TIMED_WAITING. It is typically
> something like this:
> 
>  java.lang.Thread.State: WAITING (parking)
>at sun.misc.Unsafe.park(Native Method)
>- parking to wait for  <0x00074039a450> (a
> java.util.concurrent.Semaphore$NonfairSync)
>at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
>at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
>at
> org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
>at
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
>at
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
>at
> org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
>at
> org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
>at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
>at
> org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
>at
> org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
>at
> org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
>at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
>at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
>at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
>at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
>at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> 
> It basically appears that Solr gets stuck while trying to acquire a

Re: Solr Cloud hangs when replicating updates

2013-09-04 Thread Kevin Osborn
Thanks. If there is anything I can do to help you resolve this issue, let
me know.

-Kevin


On Wed, Sep 4, 2013 at 7:51 AM, Mark Miller  wrote:

> Ill look at fixing the root issue for 4.5. I've been putting it off for
> way to long.
>
> Mark
>
> Sent from my iPhone
>
> On Sep 3, 2013, at 2:15 PM, Kevin Osborn  wrote:
>
> > I was having problems updating SolrCloud with a large batch of records.
> The
> > records are coming in bursts with lulls between updates.
> >
> > At first, I just tried large updates of 100,000 records at a time.
> > Eventually, this caused Solr to hang. When hung, I can still query Solr.
> > But I cannot do any deletes or other updates to the index.
> >
> > At first, my updates were going as SolrJ CSV posts. I have also tried
> local
> > file updates and had similar results. I finally slowed things down to
> just
> > use SolrJ's Update feature, which is basically just JavaBin. I am also
> > sending over just 100 at a time in 10 threads. Again, it eventually hung.
> >
> > Sometimes, Solr hangs in the first couple of chunks. Other times, it
> hangs
> > right away.
> >
> > These are my commit settings:
> >
> > 
> >   15000
> >   5000
> >   false
> > 
> > 
> > 3
> >   
> >
> > I have tried quite a few variations with the same results. I also tried
> > various JVM settings with the same results. The only variable seems to be
> > that reducing the cluster size from 2 to 1 is the only thing that helps.
> >
> > I also did a jstack trace. I did not see any explicit deadlocks, but I
> did
> > see quite a few threads in WAITING or TIMED_WAITING. It is typically
> > something like this:
> >
> >  java.lang.Thread.State: WAITING (parking)
> >at sun.misc.Unsafe.park(Native Method)
> >- parking to wait for  <0x00074039a450> (a
> > java.util.concurrent.Semaphore$NonfairSync)
> >at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> >at
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> >at
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> >at
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> >at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
> >at
> >
> org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
> >at
> >
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
> >at
> >
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
> >at
> >
> org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
> >at
> >
> org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
> >at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
> >at
> >
> org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
> >at
> >
> org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
> >at
> > org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
> >at
> org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
> >at
> >
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> >at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> >at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> >at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> >at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> >at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
> >at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> >at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> >at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> >at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> >at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Kevin Osborn
I am having this issue as well. I did apply this patch. Unfortunately, it
did not resolve the issue in my case.


On Wed, Sep 4, 2013 at 7:01 AM, Greg Walters
wrote:

> Tim,
>
> Take a look at
> http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.htmland
> https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue
> that you're reporting for a while then I applied the patch from SOLR-4816
> to my clients and the problems went away. If you don't feel like applying
> the patch it looks like it should be included in the release of version
> 4.5. Also note that the problem happens more frequently when the
> replication factor is greater than 1.
>
> Thanks,
> Greg
>
> -Original Message-
> From: Tim Vaillancourt [mailto:t...@elementspace.com]
> Sent: Tuesday, September 03, 2013 6:31 PM
> To: solr-user@lucene.apache.org
> Subject: SolrCloud 4.x hangs under high update volume
>
> Hey guys,
>
> I am looking into an issue we've been having with SolrCloud since the
> beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0
> yet). I've noticed other users with this same issue, so I'd really like to
> get to the bottom of it.
>
> Under a very, very high rate of updates (2000+/sec), after 1-12 hours we
> see stalled transactions that snowball to consume all Jetty threads in the
> JVM. This eventually causes the JVM to hang with most threads waiting on
> the condition/stack provided at the bottom of this message. At this point
> SolrCloud instances then start to see their neighbors (who also have all
> threads hung) as down w/"Connection Refused", and the shards become "down"
> in state. Sometimes a node or two survives and just returns 503s "no
> server hosting shard" errors.
>
> As a workaround/experiment, we have tuned the number of threads sending
> updates to Solr, as well as the batch size (we batch updates from client ->
> solr), and the Soft/Hard autoCommits, all to no avail. Turning off
> Client-to-Solr batching (1 update = 1 call to Solr), which also did not
> help. Certain combinations of update threads and batch sizes seem to
> mask/help the problem, but not resolve it entirely.
>
> Our current environment is the following:
> - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
> - 3 x Zookeeper instances, external Java 7 JVM.
> - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and
> a replica of 1 shard).
> - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good
> day.
> - 5000 max jetty threads (well above what we use when we are healthy),
> Linux-user threads ulimit is 6000.
> - Occurs under Jetty 8 or 9 (many versions).
> - Occurs under Java 1.6 or 1.7 (several minor versions).
> - Occurs under several JVM tunings.
> - Everything seems to point to Solr itself, and not a Jetty or Java
> version (I hope I'm wrong).
>
> The stack trace that is holding up all my Jetty QTP threads is the
> following, which seems to be waiting on a lock that I would very much like
> to understand further:
>
> "java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007216e68d8> (a
> java.util.concurrent.Semaphore$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
> at
>
> org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
> at
>
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
> at
>
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
> at
>
> org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
> at
>
> org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
> at
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
> at
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
> at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispat

Re: Boost by numFounds

2013-09-04 Thread Flavio Pompermaier
I found that what can do the trick for page-rank like indexing is
externalFileField! Is there an help to upload the external files to all
solr servers (in solr 3 and solrCloud)?
Or should I copy it to all solr instances data folder and then reload their
cache?

On Sat, Aug 24, 2013 at 12:36 AM, Flavio Pompermaier
wrote:

> Any help..? Is it possible to add this pagerank-like behaviour?
>
>


Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Mark Miller
I'm going to try and fix the root cause for 4.5 - I've suspected what it is 
since early this year, but it's never personally been an issue, so it's rolled 
along for a long time. 

Mark

Sent from my iPhone

On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt  wrote:

> Hey guys,
> 
> I am looking into an issue we've been having with SolrCloud since the
> beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0
> yet). I've noticed other users with this same issue, so I'd really like to
> get to the bottom of it.
> 
> Under a very, very high rate of updates (2000+/sec), after 1-12 hours we
> see stalled transactions that snowball to consume all Jetty threads in the
> JVM. This eventually causes the JVM to hang with most threads waiting on
> the condition/stack provided at the bottom of this message. At this point
> SolrCloud instances then start to see their neighbors (who also have all
> threads hung) as down w/"Connection Refused", and the shards become "down"
> in state. Sometimes a node or two survives and just returns 503s "no server
> hosting shard" errors.
> 
> As a workaround/experiment, we have tuned the number of threads sending
> updates to Solr, as well as the batch size (we batch updates from client ->
> solr), and the Soft/Hard autoCommits, all to no avail. Turning off
> Client-to-Solr batching (1 update = 1 call to Solr), which also did not
> help. Certain combinations of update threads and batch sizes seem to
> mask/help the problem, but not resolve it entirely.
> 
> Our current environment is the following:
> - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
> - 3 x Zookeeper instances, external Java 7 JVM.
> - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and
> a replica of 1 shard).
> - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good
> day.
> - 5000 max jetty threads (well above what we use when we are healthy),
> Linux-user threads ulimit is 6000.
> - Occurs under Jetty 8 or 9 (many versions).
> - Occurs under Java 1.6 or 1.7 (several minor versions).
> - Occurs under several JVM tunings.
> - Everything seems to point to Solr itself, and not a Jetty or Java version
> (I hope I'm wrong).
> 
> The stack trace that is holding up all my Jetty QTP threads is the
> following, which seems to be waiting on a lock that I would very much like
> to understand further:
> 
> "java.lang.Thread.State: WAITING (parking)
>at sun.misc.Unsafe.park(Native Method)
>- parking to wait for  <0x0007216e68d8> (a
> java.util.concurrent.Semaphore$NonfairSync)
>at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
>at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
>at
> org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
>at
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
>at
> org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
>at
> org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
>at
> org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
>at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
>at
> org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
>at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
>at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
>at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
>at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
>at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
>at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
>at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096)
>at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432)
>at
> org.eclipse.jetty.server.session.SessionHandler.doScope

Re: dataimporter tika doesn't extract certain div

2013-09-04 Thread Andreas Owen
or could i use a filter in schema.xml where i define a fieldtype and use some 
filter that understands xpath?

On 4. Sep 2013, at 11:52 AM, Shalin Shekhar Mangar wrote:

> No that wouldn't work. It seems that you probably need a custom
> Transformer to extract the right div content. I do not know if
> TikaEntityProcessor supports such a thing.
> 
> On Wed, Sep 4, 2013 at 12:38 PM, Andreas Owen  wrote:
>> so could i just nest it in a XPathEntityProcessor to filter the html or is 
>> there something like xpath for tika?
>> 
>> > forEach="/div[@id='content']" dataSource="main">
>>> url="${htm}" dataSource="dataUrl" onError="skip" htmlMapper="identity" 
>> format="html" >
>>
>>
>>
>> 
>> but now i dont know how to pass the text to tika, what do i put in url and 
>> datasource?
>> 
>> 
>> On 3. Sep 2013, at 5:56 PM, Shalin Shekhar Mangar wrote:
>> 
>>> I don't know much about Tika but in the example data-config.xml that
>>> you posted, the "xpath" attribute on the field "text" won't work
>>> because the xpath attribute is used only by a XPathEntityProcessor.
>>> 
>>> On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen  wrote:
 I want tika to only index the content in ... for 
 the field "text". unfortunately it's indexing the hole page. Can't xpath 
 do this?
 
 data-config.xml:
 
 
   
   
   
 
   >>> url="http://127.0.0.1/tkb/internet/docImportUrl.xml"; forEach="/docs/doc" 
 dataSource="main"> 
   
   
   
   
   
   
 
   >>> url="${rec.path}${rec.file}" dataSource="dataUrl" onError="skip" 
 htmlMapper="identity" format="html" >
   
 
   
   
 
 
>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>> 
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.



Re: Strange behaviour with single word and phrase

2013-09-04 Thread Alistair Young
Yep ignoring stop words. Thanks for the pointer.

Alistair

-
mov eax,1
mov ebx,0
int 80




On 04/09/2013 13:43, "Jack Krupansky"  wrote:

>Do you have stop word filtering enabled? What does your field type look
>like?
>
>If stop words are ignored, you will get exactly the behavior you
>described.
>
>-- Jack Krupansky
>
>-Original Message-
>From: Alistair Young
>Sent: Wednesday, September 04, 2013 6:57 AM
>To: solr-user@lucene.apache.org
>Subject: Strange behaviour with single word and phrase
>
>I wonder if anyone could point me in the right direction please?
>
>If I search on the phrase "the toolkit" I get hits containing that phrase
>but also hits that have the word 'the' before the word 'toolkit', no
>matter 
>how far apart they are.
>
>Also, if I search on the word 'the' there are no hits at all.
>
>Thanks,
>
>Alistair
>
>-
>mov eax,1
>mov ebx,0
>int 80 
>
>




RE: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Greg Walters
Tim,

Take a look at 
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html
 and https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that 
you're reporting for a while then I applied the patch from SOLR-4816 to my 
clients and the problems went away. If you don't feel like applying the patch 
it looks like it should be included in the release of version 4.5. Also note 
that the problem happens more frequently when the replication factor is greater 
than 1.

Thanks,
Greg

-Original Message-
From: Tim Vaillancourt [mailto:t...@elementspace.com] 
Sent: Tuesday, September 03, 2013 6:31 PM
To: solr-user@lucene.apache.org
Subject: SolrCloud 4.x hangs under high update volume

Hey guys,

I am looking into an issue we've been having with SolrCloud since the beginning 
of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've 
noticed other users with this same issue, so I'd really like to get to the 
bottom of it.

Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see 
stalled transactions that snowball to consume all Jetty threads in the JVM. 
This eventually causes the JVM to hang with most threads waiting on the 
condition/stack provided at the bottom of this message. At this point SolrCloud 
instances then start to see their neighbors (who also have all threads hung) as 
down w/"Connection Refused", and the shards become "down"
in state. Sometimes a node or two survives and just returns 503s "no server 
hosting shard" errors.

As a workaround/experiment, we have tuned the number of threads sending updates 
to Solr, as well as the batch size (we batch updates from client -> solr), and 
the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching 
(1 update = 1 call to Solr), which also did not help. Certain combinations of 
update threads and batch sizes seem to mask/help the problem, but not resolve 
it entirely.

Our current environment is the following:
- 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
- 3 x Zookeeper instances, external Java 7 JVM.
- 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a 
replica of 1 shard).
- Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day.
- 5000 max jetty threads (well above what we use when we are healthy), 
Linux-user threads ulimit is 6000.
- Occurs under Jetty 8 or 9 (many versions).
- Occurs under Java 1.6 or 1.7 (several minor versions).
- Occurs under several JVM tunings.
- Everything seems to point to Solr itself, and not a Jetty or Java version (I 
hope I'm wrong).

The stack trace that is holding up all my Jetty QTP threads is the following, 
which seems to be waiting on a lock that I would very much like to understand 
further:

"java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0007216e68d8> (a
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at
org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at
org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.jav

RE: Solr Cloud hangs when replicating updates

2013-09-04 Thread Greg Walters
Kevin,

Take a look at 
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html
 and https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that 
you're reporting for a while then I applied the patch from SOLR-4816 to my 
clients and the problems went away. If you don't feel like applying the patch 
it looks like it should be included in the release of version 4.5. Also note 
that the problem happens more frequently when the replication factor is greater 
than 1.

Thanks,
Greg

-Original Message-
From: kevin.osb...@cbsinteractive.com [mailto:kevin.osb...@cbsinteractive.com] 
On Behalf Of Kevin Osborn
Sent: Tuesday, September 03, 2013 4:16 PM
To: solr-user
Subject: Solr Cloud hangs when replicating updates

I was having problems updating SolrCloud with a large batch of records. The 
records are coming in bursts with lulls between updates.

At first, I just tried large updates of 100,000 records at a time.
Eventually, this caused Solr to hang. When hung, I can still query Solr.
But I cannot do any deletes or other updates to the index.

At first, my updates were going as SolrJ CSV posts. I have also tried local 
file updates and had similar results. I finally slowed things down to just use 
SolrJ's Update feature, which is basically just JavaBin. I am also sending over 
just 100 at a time in 10 threads. Again, it eventually hung.

Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs 
right away.

These are my commit settings:


   15000
   5000
   false
 

 3
   

I have tried quite a few variations with the same results. I also tried various 
JVM settings with the same results. The only variable seems to be that reducing 
the cluster size from 2 to 1 is the only thing that helps.

I also did a jstack trace. I did not see any explicit deadlocks, but I did see 
quite a few threads in WAITING or TIMED_WAITING. It is typically something like 
this:

  java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00074039a450> (a
java.util.concurrent.Semaphore$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418)
at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368)
at
org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300)
at
org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474)
at
org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395)
at
org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44)
at
org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364)
at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org

Re: unknown _stream_source_info while indexing rich doc in solr

2013-09-04 Thread Jack Krupansky

Did you restart Solr after editing config and schema?

-- Jack Krupansky

-Original Message- 
From: Nutan

Sent: Wednesday, September 04, 2013 3:07 AM
To: solr-user@lucene.apache.org
Subject: unknown _stream_source_info while indexing rich doc in solr

i am using solr4.2 on windows7
my schema is:









solrconfig.xml :


contents
true
ignored_
true



when i execute:
curl "http://localhost:8080/solr/update/extract?literal.id=1&commit=true";
-F "myfile=@abc.txt"

i get error:unknown field ignored_stream_
source_info.

i referred solr cookbook3.1 and solrcookbook4 but error is not resolved
please help me.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Strange behaviour with single word and phrase

2013-09-04 Thread Jack Krupansky
Do you have stop word filtering enabled? What does your field type look 
like?


If stop words are ignored, you will get exactly the behavior you described.

-- Jack Krupansky

-Original Message- 
From: Alistair Young

Sent: Wednesday, September 04, 2013 6:57 AM
To: solr-user@lucene.apache.org
Subject: Strange behaviour with single word and phrase

I wonder if anyone could point me in the right direction please?

If I search on the phrase "the toolkit" I get hits containing that phrase 
but also hits that have the word 'the' before the word 'toolkit', no matter 
how far apart they are.


Also, if I search on the word 'the' there are no hits at all.

Thanks,

Alistair

-
mov eax,1
mov ebx,0
int 80 



Re: solr performance against oracle

2013-09-04 Thread Andrea Gazzarini
You said nothing about your enviroments (e.g. operating systems, what 
kind of Oracle installation you have, whar kind of SOLR installation, 
how many data in database, how many documents in index, RAM for SOLR, 
for Oracle, for OS, and in general hardware...and so on)...


Anyway...a migration from Oracle to SOLR? That is, you're going to throw 
out the window Oracle and completely replace it with SOLR? I would 
consider other aspects first before your performace test...unless you 
have one flat table in Oracle, you should explain to your manager that 
there's a lot work that needs to be done for that kind of migration 
(e.g. collect all query requirements, denormalization)


Best,
Gazza


On 09/04/2013 02:06 PM, Sergio Stateri wrote:

Hi,

I´m trying to change the data access in the company where I work from
Oracle to Solr. Then I make some test, like this:

In Oracle:

private void go() throws Exception {
Class.forName("oracle.jdbc.driver.OracleDriver");
Connection conn =
DriverManager.getConnection("XXX");
PreparedStatement pstmt = conn.prepareStatement("SELECT DS_ROTEIRO FROM
cco_roteiro_reg_venda WHERE CD_ROTEIRO=93100689");
Date initialTime = new Date();
ResultSet rs = pstmt.executeQuery();
rs.next();
String desc = rs.getString(1);
System.out.println("total time:" + (new
Date().getTime()-initialTime.getTime()) + " ms");
System.out.println(desc);
rs.close();
pstmt.close();
conn.close();
}



And in Solr:

private void go() throws Exception {
String baseUrl = "http://localhost:8983/solr/";;
this.solrServerUrl = "http://localhost:8983/solr/roteiros/";;
server = new HttpSolrServer(solrUrl);
  String docId = AddOneRoteiroToCollection.docId;
  HttpSolrServer solr = new HttpSolrServer(baseUrl);
SolrServer solrServer = new HttpSolrServer(solrServerUrl);

solr.setRequestWriter(new BinaryRequestWriter());
SolrQuery query = new SolrQuery();
  query.setQuery("(id:" + docId + ")"); // search by id
query.addField("id");
query.addField("descricaoRoteiro");

extrairEApresentarResultados(query);
  }

private void extrairEApresentarResultados(SolrQuery query) throws
SolrServerException {
Date initialTime = new Date();
QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();
long now = new Date().getTime()-initialTime.getTime(); // HERE I CHECHING
THE SOLR RESPONSE TIME
  for (SolrDocument solrDocument : docs) {
System.out.println(solrDocument);
}
System.out.println("Total de documentos encontrados: " + docs.size());
System.out.println("Tempo total: " + now + " ms");
}


"descricaoRoteiro" is the same data that I´m getting in both, using the PK
CD_ROTEIRO that´s in Solr with name "id" (it´s the same data).
Solr data is the same machine, and Solr And Oracle have the same number of
records (arround 800 thousands).

Solr aways returns the data arround 150~200 ms (from localhost), but Oracle
returns arround 20 ms (and Oracle server is in another company, I´m using
dedicated link to access it).

How can I tell to my managers that I´d like to use Solr? I saw that filters
in Solr taks arround 6~10 ms, but they´re a query inside another query
that´s returned previosly.


Thanks for any help. I´d like so much to use Solr, but I really don´t know
to explain this to my managers.






solr performance against oracle

2013-09-04 Thread Sergio Stateri
Hi,

I´m trying to change the data access in the company where I work from
Oracle to Solr. Then I make some test, like this:

In Oracle:

private void go() throws Exception {
Class.forName("oracle.jdbc.driver.OracleDriver");
Connection conn =
DriverManager.getConnection("XXX");
PreparedStatement pstmt = conn.prepareStatement("SELECT DS_ROTEIRO FROM
cco_roteiro_reg_venda WHERE CD_ROTEIRO=93100689");
Date initialTime = new Date();
ResultSet rs = pstmt.executeQuery();
rs.next();
String desc = rs.getString(1);
System.out.println("total time:" + (new
Date().getTime()-initialTime.getTime()) + " ms");
System.out.println(desc);
rs.close();
pstmt.close();
conn.close();
}



And in Solr:

private void go() throws Exception {
String baseUrl = "http://localhost:8983/solr/";;
this.solrServerUrl = "http://localhost:8983/solr/roteiros/";;
server = new HttpSolrServer(solrUrl);
 String docId = AddOneRoteiroToCollection.docId;
 HttpSolrServer solr = new HttpSolrServer(baseUrl);
SolrServer solrServer = new HttpSolrServer(solrServerUrl);

solr.setRequestWriter(new BinaryRequestWriter());
SolrQuery query = new SolrQuery();
 query.setQuery("(id:" + docId + ")"); // search by id
query.addField("id");
query.addField("descricaoRoteiro");

extrairEApresentarResultados(query);
 }

private void extrairEApresentarResultados(SolrQuery query) throws
SolrServerException {
Date initialTime = new Date();
QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();
long now = new Date().getTime()-initialTime.getTime(); // HERE I CHECHING
THE SOLR RESPONSE TIME
 for (SolrDocument solrDocument : docs) {
System.out.println(solrDocument);
}
System.out.println("Total de documentos encontrados: " + docs.size());
System.out.println("Tempo total: " + now + " ms");
}


"descricaoRoteiro" is the same data that I´m getting in both, using the PK
CD_ROTEIRO that´s in Solr with name "id" (it´s the same data).
Solr data is the same machine, and Solr And Oracle have the same number of
records (arround 800 thousands).

Solr aways returns the data arround 150~200 ms (from localhost), but Oracle
returns arround 20 ms (and Oracle server is in another company, I´m using
dedicated link to access it).

How can I tell to my managers that I´d like to use Solr? I saw that filters
in Solr taks arround 6~10 ms, but they´re a query inside another query
that´s returned previosly.


Thanks for any help. I´d like so much to use Solr, but I really don´t know
to explain this to my managers.


-- 
Sergio Stateri Jr.
stat...@gmail.com


Re: Indexing pdf files - question.

2013-09-04 Thread Nutan Shinde
My solrconfig.xml is:

 





desc   

true

attr_

true







 

Schema.xml:

 

  

  





 















doc_id

 

I have created extract directory and copied all required .jar and solr-cell
jar files into this extract directory and given its path in lib tag in
solrconfig.xml

 

When I try out this:

 

curl
"http://localhost:8080/solr/update/extract?literal.doc_id=1&commit=true";

-F myfile=@solr-word.pdf    in Windows 7.

 

I get /solr/update/extract is not available and sometimes I get access
denied error.

I tried resolving through net,but in vain.as all the solutions are related
to linux os,im working on Windows.

Please help me and provide solutions related o Windows os.

I referred Apache_solr_4_Cookbook.

Thanks a lot.



Re: Starting Solr in Tomcat with specifying ZK host(s)

2013-09-04 Thread maephisto
Thanks Shawn!

Indeed, setting the JAVA_OPTS and restarting Tomcat did the trick.
Currently I'm exploring and experimenting with SolrCloud, thus I only used
only one ZK.
For a production environment you suggestion would, of course, be mandatory.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Starting-Solr-in-Tomcat-with-specifying-ZK-host-s-tp4087916p4088164.html
Sent from the Solr - User mailing list archive at Nabble.com.


Strange behaviour with single word and phrase

2013-09-04 Thread Alistair Young
I wonder if anyone could point me in the right direction please?

If I search on the phrase "the toolkit" I get hits containing that phrase but 
also hits that have the word 'the' before the word 'toolkit', no matter how far 
apart they are.

Also, if I search on the word 'the' there are no hits at all.

Thanks,

Alistair

-
mov eax,1
mov ebx,0
int 80


Re: Measuring SOLR performance

2013-09-04 Thread Dmitry Kan
Hi Roman,

Ok, I will. Thanks!

Cheers,
Dmitry


On Tue, Sep 3, 2013 at 4:46 PM, Roman Chyla  wrote:

> Hi Dmitry,
>
> Thanks for the feedback. Yes, it is indeed jmeter issue (or rather, the
> issue of the plugin we use to generate charts). You may want to use the
> github for whatever comes next
>
> https://github.com/romanchyla/solrjmeter/issues
>
> Cheers,
>
>   roman
>
>
> On Tue, Sep 3, 2013 at 7:54 AM, Dmitry Kan  wrote:
>
> > Hi Roman,
> >
> > Thanks, the --additionalSolrParams was just what I wanted and works fine.
> >
> > BTW, if you have some special "bug tracking forum" for the tool, I'm
> happy
> > to submit questions / bug reports there. Otherwise, this email list is ok
> > (for me at least).
> >
> > One other thing I have noticed in the err logs was a series of messages
> of
> > this sort upon generating the perf test report. Seems to be jmeter
> related
> > (the err messages disappear, if extra lib dir is present under ext
> > directory).
> >
> > java.lang.Throwable: Could not access
> > /home/dmitry/projects/lab/solrjmeter7/solrjmeter/jmeter/lib/ext/lib
> > at
> >
> kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
> > at kg.apc.cmd.UniversalRunner.(UniversalRunner.java:55)
> > at
> >
> kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
> > at kg.apc.cmd.UniversalRunner.(UniversalRunner.java:55)
> >
> > at
> >
> kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109)
> > at kg.apc.cmd.UniversalRunner.(UniversalRunner.java:55)
> >
> >
> >
> > On Tue, Sep 3, 2013 at 2:50 AM, Roman Chyla 
> wrote:
> >
> > > Hi Dmitry,
> > >
> > > If it is something you want to pass with every request (which is my use
> > > case), you can pass it as additional solr params, eg.
> > >
> > > python solrjmeter
> > >
> > >
> >
> --additionalSolrParams="fq=other_field:bar+facet=true+facet.field=facet_field_name"
> > > 
> > >
> > > the string should be url encoded.
> > >
> > > If it is something that changes with every request, you should modify
> the
> > > jmeter test. If you open/load it with jmeter GUI, in the HTTP request
> > > processor you can define other additional fields to pass with the
> > request.
> > > These values can come from the CSV file, you'll see an example how to
> use
> > > that when you open the test difinition file.
> > >
> > > Cheers,
> > >
> > >   roman
> > >
> > >
> > >
> > >
> > > On Mon, Sep 2, 2013 at 3:12 PM, Dmitry Kan 
> wrote:
> > >
> > > > Hi Erick,
> > > >
> > > > Agree, this is perfectly fine to mix them in solr. But my question is
> > > about
> > > > solrjmeter input query format. Just couldn't find a suitable example
> on
> > > the
> > > > solrjmeter's github.
> > > >
> > > > Dmitry
> > > >
> > > >
> > > >
> > > > On Mon, Sep 2, 2013 at 5:40 PM, Erick Erickson <
> > erickerick...@gmail.com
> > > > >wrote:
> > > >
> > > > > filter and facet queries can be freely intermixed, it's not a
> > problem.
> > > > > What problem are you seeing when you try this?
> > > > >
> > > > > Best,
> > > > > Erick
> > > > >
> > > > >
> > > > > On Mon, Sep 2, 2013 at 7:46 AM, Dmitry Kan 
> > > wrote:
> > > > >
> > > > > > Hi Roman,
> > > > > >
> > > > > > What's the format for running the facet+filter queries?
> > > > > >
> > > > > > Would something like this work:
> > > > > >
> > > > > > field:foo  >=50  fq=other_field:bar facet=true
> > > > > facet.field=facet_field_name
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Dmitry
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 23, 2013 at 2:34 PM, Dmitry Kan <
> solrexp...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Roman,
> > > > > > >
> > > > > > > With adminPath="/admin" or adminPath="/admin/cores", no.
> > > > Interestingly
> > > > > > > enough, though, I can access
> > > > > > > http://localhost:8983/solr/statements/admin/system
> > > > > > >
> > > > > > > But I can access http://localhost:8983/solr/admin/cores, only
> > when
> > > > > with
> > > > > > > adminPath="/admin/cores" (which suggests that this is the right
> > > value
> > > > > to
> > > > > > be
> > > > > > > used for cores), and not with adminPath="/admin".
> > > > > > >
> > > > > > > Bottom line, these core configuration is not self-evident.
> > > > > > >
> > > > > > > Dmitry
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Aug 23, 2013 at 4:18 AM, Roman Chyla <
> > > roman.ch...@gmail.com
> > > > > > >wrote:
> > > > > > >
> > > > > > >> Hi Dmitry,
> > > > > > >> So it seems solrjmeter should not assume the adminPath - and
> > > perhaps
> > > > > > needs
> > > > > > >> to be passed as an argument. When you set the adminPath, are
> you
> > > > able
> > > > > to
> > > > > > >> access localhost:8983/solr/statements/admin/cores ?
> > > > > > >>
> > > > > > >> roman
> > > > > > >>
> > > > > > >>
> > > > > > >> On Wed, Aug 21, 2013 at 7:36 AM, Dmitry Kan <
> > solrexp...@gmail.com
> > > >
> > > > > > wrote:
> > > 

Re: dataimporter tika doesn't extract certain div

2013-09-04 Thread Shalin Shekhar Mangar
No that wouldn't work. It seems that you probably need a custom
Transformer to extract the right div content. I do not know if
TikaEntityProcessor supports such a thing.

On Wed, Sep 4, 2013 at 12:38 PM, Andreas Owen  wrote:
> so could i just nest it in a XPathEntityProcessor to filter the html or is 
> there something like xpath for tika?
>
>  forEach="/div[@id='content']" dataSource="main">
>  url="${htm}" dataSource="dataUrl" onError="skip" htmlMapper="identity" 
> format="html" >
> 
> 
> 
>
> but now i dont know how to pass the text to tika, what do i put in url and 
> datasource?
>
>
> On 3. Sep 2013, at 5:56 PM, Shalin Shekhar Mangar wrote:
>
>> I don't know much about Tika but in the example data-config.xml that
>> you posted, the "xpath" attribute on the field "text" won't work
>> because the xpath attribute is used only by a XPathEntityProcessor.
>>
>> On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen  wrote:
>>> I want tika to only index the content in ... for 
>>> the field "text". unfortunately it's indexing the hole page. Can't xpath do 
>>> this?
>>>
>>> data-config.xml:
>>>
>>> 
>>>
>>>
>>>
>>> 
>>>>> url="http://127.0.0.1/tkb/internet/docImportUrl.xml"; forEach="/docs/doc" 
>>> dataSource="main"> 
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>> url="${rec.path}${rec.file}" dataSource="dataUrl" onError="skip" 
>>> htmlMapper="identity" format="html" >
>>>
>>>
>>>
>>>
>>> 
>>> 
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Change the score of a document based on the *value* of a multifield using dismax

2013-09-04 Thread danielitos85
Thanks a lot David. 
I will try it ;)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Change-the-score-of-a-document-based-on-the-value-of-a-multifield-tp4087503p4088145.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimporter tika doesn't extract certain div

2013-09-04 Thread Andreas Owen
so could i just nest it in a XPathEntityProcessor to filter the html or is 
there something like xpath for tika?







but now i dont know how to pass the text to tika, what do i put in url and 
datasource?


On 3. Sep 2013, at 5:56 PM, Shalin Shekhar Mangar wrote:

> I don't know much about Tika but in the example data-config.xml that
> you posted, the "xpath" attribute on the field "text" won't work
> because the xpath attribute is used only by a XPathEntityProcessor.
> 
> On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen  wrote:
>> I want tika to only index the content in ... for the 
>> field "text". unfortunately it's indexing the hole page. Can't xpath do this?
>> 
>> data-config.xml:
>> 
>> 
>>
>>
>>
>> 
>>> url="http://127.0.0.1/tkb/internet/docImportUrl.xml"; forEach="/docs/doc" 
>> dataSource="main"> 
>>
>>
>>
>>
>>
>>
>> 
>>> url="${rec.path}${rec.file}" dataSource="dataUrl" onError="skip" 
>> htmlMapper="identity" format="html" >
>>
>> 
>>
>>
>> 
>> 
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.



Re: DIH + Solr Cloud

2013-09-04 Thread Tim Vaillancourt

Hey Alejandro,

I guess it means what you call "more than one instance".

The request handlers are at the core-level, and not the Solr 
instance/global level, and within each of those cores you could have one 
or more data import handlers.


Most setups have 1 DIH per core at the handler location "/dataimport", 
but I believe you could have several, ie: "/dataimport2", "/dataimport3" 
if you had different DIH configs for each handler.


Within a single data import handler, you can have several "entities", 
which are what explain to the DIH processes how to get/index the data. 
What you can do here is have several entities that construct your index, 
and execute those entities with several separate HTTP calls to the DIH, 
thus creating more than one instance of the DIH process within 1 core 
and 1 DIH handler.


ie:

curl 
"http://localhost:8983/solr/core1/dataimport?command=full-import&entity=suppliers"; 
&
curl 
"http://localhost:8983/solr/core1/dataimport?command=full-import&entity=parts"; 
&
curl 
"http://localhost:8983/solr/core1/dataimport?command=full-import&entity=companies"; 
&


http://wiki.apache.org/solr/DataImportHandler#Commands

Cheers,

Tim

On 03/09/13 09:25 AM, Alejandro Calbazana wrote:

Hi,

Quick question about data import handlers in Solr cloud.  Does anyone use
more than one instance to support the DIH process?  Or is the typical setup
to have one box setup as only the DIH and keep this responsibility outside
of the Solr cloud environment?  I'm just trying to get picture of his this
is typically deployed.

Thanks!

Alejandro



unknown _stream_source_info while indexing rich doc in solr

2013-09-04 Thread Nutan
i am using solr4.2 on windows7
my schema is:









 solrconfig.xml :


contents
true
ignored_
true



when i execute:
curl "http://localhost:8080/solr/update/extract?literal.id=1&commit=true";
-F "myfile=@abc.txt"

i get error:unknown field ignored_stream_
source_info.

i referred solr cookbook3.1 and solrcookbook4 but error is not resolved
please help me.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136.html
Sent from the Solr - User mailing list archive at Nabble.com.