unknown _stream_source_info while indexing rich doc in solr

2013-09-04 Thread Nutan
i am using solr4.2 on windows7 my schema is: field name=id type=string indexed=true stored=true required=true/ field name=author type=string indexed=true stored=true multiValued=true/ field name=comments type=text indexed=true stored=true multiValued=false/ field name=keywords type=text

Re: dataimporter tika doesn't extract certain div

2013-09-04 Thread Andreas Owen
so could i just nest it in a XPathEntityProcessor to filter the html or is there something like xpath for tika? entity name=htm processor=XPathEntityProcessor url=${rec.file} forEach=/div[@id='content'] dataSource=main entity name=tika processor=TikaEntityProcessor

Re: DIH + Solr Cloud

2013-09-04 Thread Tim Vaillancourt
Hey Alejandro, I guess it means what you call more than one instance. The request handlers are at the core-level, and not the Solr instance/global level, and within each of those cores you could have one or more data import handlers. Most setups have 1 DIH per core at the handler location

Re: Change the score of a document based on the *value* of a multifield using dismax

2013-09-04 Thread danielitos85
Thanks a lot David. I will try it ;) -- View this message in context: http://lucene.472066.n3.nabble.com/Change-the-score-of-a-document-based-on-the-value-of-a-multifield-tp4087503p4088145.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: dataimporter tika doesn't extract certain div

2013-09-04 Thread Shalin Shekhar Mangar
No that wouldn't work. It seems that you probably need a custom Transformer to extract the right div content. I do not know if TikaEntityProcessor supports such a thing. On Wed, Sep 4, 2013 at 12:38 PM, Andreas Owen a...@conx.ch wrote: so could i just nest it in a XPathEntityProcessor to filter

Re: Measuring SOLR performance

2013-09-04 Thread Dmitry Kan
Hi Roman, Ok, I will. Thanks! Cheers, Dmitry On Tue, Sep 3, 2013 at 4:46 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, Thanks for the feedback. Yes, it is indeed jmeter issue (or rather, the issue of the plugin we use to generate charts). You may want to use the github for

Strange behaviour with single word and phrase

2013-09-04 Thread Alistair Young
I wonder if anyone could point me in the right direction please? If I search on the phrase the toolkit I get hits containing that phrase but also hits that have the word 'the' before the word 'toolkit', no matter how far apart they are. Also, if I search on the word 'the' there are no hits at

Re: Starting Solr in Tomcat with specifying ZK host(s)

2013-09-04 Thread maephisto
Thanks Shawn! Indeed, setting the JAVA_OPTS and restarting Tomcat did the trick. Currently I'm exploring and experimenting with SolrCloud, thus I only used only one ZK. For a production environment you suggestion would, of course, be mandatory. -- View this message in context:

Re: Indexing pdf files - question.

2013-09-04 Thread Nutan Shinde
My solrconfig.xml is: requestHandler name=/update/extract class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=fmap.contentdesc/str !-to map this field of my table which is defined as shown below in schem.xml-- str name=lowernamestrue/str str name=uprefixattr_/str

solr performance against oracle

2013-09-04 Thread Sergio Stateri
Hi, I´m trying to change the data access in the company where I work from Oracle to Solr. Then I make some test, like this: In Oracle: private void go() throws Exception { Class.forName(oracle.jdbc.driver.OracleDriver); Connection conn =

Re: solr performance against oracle

2013-09-04 Thread Andrea Gazzarini
You said nothing about your enviroments (e.g. operating systems, what kind of Oracle installation you have, whar kind of SOLR installation, how many data in database, how many documents in index, RAM for SOLR, for Oracle, for OS, and in general hardware...and so on)... Anyway...a migration

Re: Strange behaviour with single word and phrase

2013-09-04 Thread Jack Krupansky
Do you have stop word filtering enabled? What does your field type look like? If stop words are ignored, you will get exactly the behavior you described. -- Jack Krupansky -Original Message- From: Alistair Young Sent: Wednesday, September 04, 2013 6:57 AM To:

Re: unknown _stream_source_info while indexing rich doc in solr

2013-09-04 Thread Jack Krupansky
Did you restart Solr after editing config and schema? -- Jack Krupansky -Original Message- From: Nutan Sent: Wednesday, September 04, 2013 3:07 AM To: solr-user@lucene.apache.org Subject: unknown _stream_source_info while indexing rich doc in solr i am using solr4.2 on windows7 my

RE: Solr Cloud hangs when replicating updates

2013-09-04 Thread Greg Walters
Kevin, Take a look at http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html and https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that you're reporting for a while then I applied the patch from SOLR-4816 to my clients and the problems went

RE: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Greg Walters
Tim, Take a look at http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html and https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that you're reporting for a while then I applied the patch from SOLR-4816 to my clients and the problems went away.

Re: Strange behaviour with single word and phrase

2013-09-04 Thread Alistair Young
Yep ignoring stop words. Thanks for the pointer. Alistair - mov eax,1 mov ebx,0 int 80 On 04/09/2013 13:43, Jack Krupansky j...@basetechnology.com wrote: Do you have stop word filtering enabled? What does your field type look like? If stop words are ignored, you will get

Re: dataimporter tika doesn't extract certain div

2013-09-04 Thread Andreas Owen
or could i use a filter in schema.xml where i define a fieldtype and use some filter that understands xpath? On 4. Sep 2013, at 11:52 AM, Shalin Shekhar Mangar wrote: No that wouldn't work. It seems that you probably need a custom Transformer to extract the right div content. I do not know if

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Mark Miller
I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys,

Re: Boost by numFounds

2013-09-04 Thread Flavio Pompermaier
I found that what can do the trick for page-rank like indexing is externalFileField! Is there an help to upload the external files to all solr servers (in solr 3 and solrCloud)? Or should I copy it to all solr instances data folder and then reload their cache? On Sat, Aug 24, 2013 at 12:36 AM,

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Kevin Osborn
I am having this issue as well. I did apply this patch. Unfortunately, it did not resolve the issue in my case. On Wed, Sep 4, 2013 at 7:01 AM, Greg Walters gwalt...@sherpaanalytics.comwrote: Tim, Take a look at

Re: Solr Cloud hangs when replicating updates

2013-09-04 Thread Kevin Osborn
Thanks. If there is anything I can do to help you resolve this issue, let me know. -Kevin On Wed, Sep 4, 2013 at 7:51 AM, Mark Miller markrmil...@gmail.com wrote: Ill look at fixing the root issue for 4.5. I've been putting it off for way to long. Mark Sent from my iPhone On Sep 3,

Re: Solr Cloud hangs when replicating updates

2013-09-04 Thread Mark Miller
Ill look at fixing the root issue for 4.5. I've been putting it off for way to long. Mark Sent from my iPhone On Sep 3, 2013, at 2:15 PM, Kevin Osborn kevin.osb...@cbsi.com wrote: I was having problems updating SolrCloud with a large batch of records. The records are coming in bursts with

Need help on Joining and sorting syntax and limitations between multiple documents in solr-4.4.0

2013-09-04 Thread Sukanta Dey
Hi Team, In my project I am going to use Apache solr-4.4.0 version for searching. While doing that I need to join between multiple solr documents within the same core on one of the common field across the documents. Though I successfully join the documents using solr-4.4.0 join syntax, it is

How to config SOLR server for spell check functionality

2013-09-04 Thread sebastian.manolescu
I want to implement spell check functionality offerd by solr using MySql database, but I dont understand how. Here the basic flow of what I want to do. I have a simple inputText (in jsf) and if I type the word shwo the response to OutputLabel should be show. First of all I'm using the following

Re: solr performance against oracle

2013-09-04 Thread Toke Eskildsen
On Wed, 2013-09-04 at 14:06 +0200, Sergio Stateri wrote: I´m trying to change the data access in the company where I work from Oracle to Solr. They work on different principles and fulfill different needs. Comparing them by a performance oriented test are not likely to be usable point for

Solr highlighting fragment issue

2013-09-04 Thread Sreehareesh Kaipravan Meethaleveetil
Hi, I'm having some issues with Solr search results (using Solr 1.4 ) . I have enabled highlighting of searched text (hl=true) and set the fragment size as 500 (hl.fragsize=500) in the search query. Below is the (screen shot) results shown when I searched for the term 'grandfather' (2 results

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Mark Miller
There is an issue if I remember right, but I can't find it right now. If anyone that has the problem could try this patch, that would be very helpful: http://pastebin.com/raw.php?i=aaRWwSGP - Mark On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got

Re: Solr Cloud hangs when replicating updates

2013-09-04 Thread Mark Miller
It would be great if you could give this patch a try: http://pastebin.com/raw.php?i=aaRWwSGP - Mark On Wed, Sep 4, 2013 at 8:31 AM, Kevin Osborn kevin.osb...@cbsi.com wrote: Thanks. If there is anything I can do to help you resolve this issue, let me know. -Kevin On Wed, Sep 4, 2013 at

Questions about Replication Factor on solrcloud

2013-09-04 Thread Lisandro Montaño
Hi all, I’m currently working on deploying a solrcloud distribution in centos machines and wanted to have more guidance about Replication Factor configuration. I have configured two servers with solrcloud over tomcat and a third server as zookeeper. I have configured successfully and have

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Tim Vaillancourt
Thanks guys! :) Mark: this patch is much appreciated, I will try to test this shortly, hopefully today. For my curiosity/understanding, could someone explain to me quickly what locks SolrCloud takes on updates? Was I on to something that more shards decrease the chance for locking? Secondly, I

Re: cleanup after OutOfMemoryError

2013-09-04 Thread Mark Miller
I don't know that there is any 'safe' thing you can do other than restart - but if I were to try anything, I would use true for rollback. - Mark On Wed, Sep 4, 2013 at 9:44 AM, Ryan McKinley ryan...@gmail.com wrote: I have an application where I am calling DirectUpdateHandler2 directly with:

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Mark Miller
The 'lock' or semaphore was added to cap the number of threads that would be used. Previously, the number of threads in use could spike to many, many thousands on heavy updates. A limit on the number of outstanding requests was put in place to keep this from happening. Something like 16 * the

cleanup after OutOfMemoryError

2013-09-04 Thread Ryan McKinley
I have an application where I am calling DirectUpdateHandler2 directly with: update.addDoc(cmd); This will sometimes hit: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:248) at

subindex

2013-09-04 Thread Peyman Faratin
Hi Is there a way to build a new (smaller) index from an existing (larger) index where the smaller index contains a subset of the fields of the larger index? thank you

RE: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Markus Jelsma
Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for

Re: Numeric fields and payload

2013-09-04 Thread PETER LENAHAN
Chris Hostetter hossman_lucene at fucit.org writes: : is it possible to store (text) payload to numeric fields (class : solr.TrieDoubleField)? My goal is to store measure units to numeric : features - e.g. '1.5 cm' - and to use faceted search with these fields. : But the field type

Little XsltResponseWriter documentation bug (Attn: Wiki Admin)

2013-09-04 Thread Dmitri Popov
Hi, http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF too) become out of date: In configuration section queryResponseWriter name=xslt class=org.apache.solr.request.XSLTResponseWriter int name=xsltCacheLifetimeSeconds5/int /queryResponseWriter class name

RE: Solr highlighting fragment issue

2013-09-04 Thread Bryan Loofbourrow
I’m having some issues with Solr search results (using Solr 1.4 ) . I have enabled highlighting of searched text (hl=true) and set the fragment size as 500 (hl.fragsize=500) in the search query. Below is the (screen shot) results shown when I searched for the term ‘grandfather’ (2 results are

Re: Little XsltResponseWriter documentation bug (Attn: Wiki Admin)

2013-09-04 Thread Upayavira
It's a wiki. Can't you correct it? Upayavira On Wed, Sep 4, 2013, at 08:25 PM, Dmitri Popov wrote: Hi, http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF too) become out of date: In configuration section queryResponseWriter name=xslt

Re: Little XsltResponseWriter documentation bug (Attn: Wiki Admin)

2013-09-04 Thread Dmitri Popov
Upayavira, I could edit that page myself, but need to be confirmed human according to http://wiki.apache.org/solr/FrontPage#How_to_edit_this_Wiki My wiki account name is 'pin' just in case. On Wed, Sep 4, 2013 at 5:27 PM, Upayavira u...@odoko.co.uk wrote: It's a wiki. Can't you correct it?

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Tim Vaillancourt
Thanks so much for the explanation Mark, I owe you one (many)! We have this on our high TPS cluster and will run it through it's paces tomorrow. I'll provide any feedback I can, more soon! :D Cheers, Tim

Invalid Version when slave node pull replication from master node

2013-09-04 Thread YouPeng Yang
HI solrusers I'm testing the replication within SolrCloud . I just uncomment the replication section separately on the master and slave node. The replication section setting on the master node: lst name=master str name=replicateAftercommit/str str

Re: Invalid Version when slave node pull replication from master node

2013-09-04 Thread YouPeng Yang
Hi again I'm using Solr4.4. 2013/9/5 YouPeng Yang yypvsxf19870...@gmail.com HI solrusers I'm testing the replication within SolrCloud . I just uncomment the replication section separately on the master and slave node. The replication section setting on the master node:

Re: Invalid Version when slave node pull replication from master node

2013-09-04 Thread YouPeng Yang
Hi all I solve the problem by add the coreName explicitly according to http://wiki.apache.org/solr/SolrReplication#Replicating_solrconfig.xml. But I want to make sure about that is it necessary to set the coreName explicitly. Is there any SolrJ API to pull the replication on the slave node

Re: unknown _stream_source_info while indexing rich doc in solr

2013-09-04 Thread Nutan
yes sir i did restart the tomcat. On Wed, Sep 4, 2013 at 6:27 PM, Jack Krupansky-2 [via Lucene] ml-node+s472066n4088181...@n3.nabble.com wrote: Did you restart Solr after editing config and schema? -- Jack Krupansky -Original Message- From: Nutan Sent: Wednesday, September 04,