RE: how are you using Solr?

2010-09-26 Thread Markus Jelsma
http://wiki.apache.org/solr/PublicServers http://www.lucidimagination.com/developer/Community/Application-Showcase-Wiki   -Original message- From: Girish Pandit pandit.gir...@gmail.com Sent: Sun 26-09-2010 14:16 To: solr-user@lucene.apache.org; Subject: how are you using Solr? I am

RE: spellcheck on multiple fields?

2010-09-27 Thread Markus Jelsma
You can use copyField to get multiple fields in the field you use for spell checking, don't forget to set it to multiValued.   -Original message- From: Savannah Beckett savannah_becket...@yahoo.com Sent: Mon 27-09-2010 10:08 To: solr-user@lucene.apache.org; Subject: spellcheck on

RE: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
You could create a custom update processor that adds a digest field for newly added documents that do not have the digest field themselves. This way, the documents that are not added by Nutch get a proper non-empty digest field so the deduplication processor won't create the same empty hash and

Re: Solr an Greek Chars

2009-12-28 Thread Markus Jelsma
Hi, Did you post your documents in UTF-8? Also, for querying through GET using non-ascii you must reconfigure Tomcat6 as per the manual [1]. Cheers, [1] http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config ZAROGKIKAS,GIORGOS zei: Hi there I’m using solr 1.4 under

Re: Multi language support

2010-01-11 Thread Markus Jelsma
languages. So you would have a field type like: fieldType name=en_text class=solr.TextField ... analyzer type= filter class=solr.StopFilterFactory words=stopwords.en.txt filter class=solr.SynonymFilterFactory synonyms=synoyms.en.txt etc etc. Cheers, - Markus Jelsma Buyways B.V

Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-11 Thread Markus Jelsma
Hello Kelly, I am not entirely sure if i understand your problem correctly. But i believe your first approach is the right one. Your question: Which products are available that contain skus with color Green, size M, and a price of $9.99 or less? can be easily answered using a schema like yours.

Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-11 Thread Markus Jelsma
... -- sku id = 7 [color=green, size=S, price=10.99] -- sku id = 9 [color=green, size=L, price=10.99] -- sku id = 10 [color=blue, size=S, price=9.99] -- sku id = 11 [color=blue, size=M, price=10.99] -- sku id = 12 [color=blue, size=L, price=10.99] Regards, Kelly Markus Jelsma - Buyways B.V

Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-12 Thread Markus Jelsma
Hello, I now believe that i really did misunderstand the problem and, unfortunately, i don't believe i can be of much assistance as i did not have to implement a similar problem. Cheers, - Markus Jelsma Buyways B.V. Technisch ArchitectFriesestraatweg 215c http

LucidGaze, No Data

2010-01-20 Thread Markus Jelsma
Hello all, I have installed and reconfigured everything according to the readme supplied with the recent LucidGaze release. Files have been written in the gaze directory in SOLR_HOME but the *.log.x.y files are all empty! The rrd directory does contain something that is about 24MiB. In the

Re: LucidGaze, No Data

2010-01-25 Thread Markus Jelsma
Hi, Is the list without clue or should i mail Lucid directly? Cheers, I have installed and reconfigured everything according to the readme supplied with the recent LucidGaze release. Files have been written in the gaze directory in SOLR_HOME but the *.log.x.y files are all empty! The rrd

Re: solr application for website crawling and indexing html, pdf, word, ... files

2010-01-25 Thread Markus Jelsma
Hello Frank, Answers are inline: Frank van Lingen said: I recently started working with solr and find it easy to setup and tinker with. I now want to scale up my setup and was wondering if there is an application/component that can do the following (I was not able to find documentation on

Re: To store or not to store serialized objects in solr

2010-01-26 Thread Markus Jelsma
Hello Andre, We have used this approach before. We did keep all our data in a RDBMS but added serialized objects to the index so we could simply query the record and display it as is, without any hassle and SQL connections. Although storing this data sounds a bit strange, it actually works well

RE: update doc success, but could not find the new value

2010-01-27 Thread Markus Jelsma
Check out Jetty's output or Tomcat's logs. The logging is very verbose and you can get a clearer picture. Jennifer Luo said: I am using example, only with two fields, id and body. Id is string field, body is text field. I use another program to do a http post to update the document, url is

Re: Solr and location based searches

2010-02-02 Thread Markus Jelsma
://williams.best.vwh.net/avform.htm#Dist Cheers, Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

Re: spellcheck

2010-02-11 Thread Markus Jelsma
the onlyMorePopular directive fool you, it caught many users of guard before :) Cheers, Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

Re: spellcheck

2010-02-11 Thread Markus Jelsma
name=spellcheckIndexDirspellcheckerfile/str /lst /searchComponent -- View this message in context: http://old.nabble.com/spellcheck-tp27527425p27548078.html Sent from the Solr - User mailing list archive at Nabble.com. Markus Jelsma - Technisch Architect - Buyways BV http

Re: spellcheck

2010-02-11 Thread Markus Jelsma
/str str name=spellcheck.dictionaryexternal/str /lst arr name=last-components strquery/str strspellcheck/str strmlt/str /arr /requestHandler Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

Re: Spell check returns strange suggestion

2010-02-22 Thread Markus Jelsma
darniz said: Hello All Please reply to this ASAP I am using indexbasedSpellchecker right now i copy only model, and make names and some other fields to my spellcheck field. Hence my spell check field consists of only 120 words. The issue is if i type hond i get back honda which is fine.

Re: spellcheck all time

2010-02-23 Thread Markus Jelsma
Although the wiki states it correctly (will also return suggestions even if properly spelled), perhaps we should add that it's a better practice to only present end users with suggestions if the correctlySpelled flag is false. This issue keeps coming back. Chris Hostetter said: : I have a

Re: logging

2010-02-23 Thread Markus Jelsma
Hi Peter, It depends on what you call a debug log and how you interface with Solr. Anyway, if you use Solr over HTTP you can check out the logs of your servlet container and configure the logging behaviour on the Solr web admin page. Usually, the default logging is quite useful. Either way, see

Re: If you could have one feature in Solr...

2010-02-24 Thread Markus Jelsma
in Solr 1.5: - field collapsing - Solr Spatial On Wednesday 24 February 2010 14:42:18 Grant Ingersoll wrote: What would it be? Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

Re: If you could have one feature in Solr...

2010-02-24 Thread Markus Jelsma
Robert Muir wrote: On Wed, Feb 24, 2010 at 9:22 AM, Markus Jelsma mar...@buyways.nl wrote: - stemmers for many more different languages I don't want to hijack this thread, but i would like to know which languages you are interested in! Markus Jelsma - Technisch Architect - Buyways BV

Re: If you could have one feature in Solr...

2010-02-24 Thread Markus Jelsma
2010 14:42:18 Grant Ingersoll wrote: What would it be? Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

Re: good spell dictionary

2010-03-19 Thread Markus Jelsma
-spell-dictionary-tp27950854p27950921.html Sent from the Solr - User mailing list archive at Nabble.com. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

Re: some snynonym clarifications

2010-03-19 Thread Markus Jelsma
it will help me understand the backslash \ also better. Thanks, Mark. On Thu, Mar 18, 2010 at 12:19 PM, Markus Jelsma mar...@buyways.nl wrote: Hi, Check out the wiki page on the SynonymFilterFactory. It gives a decent explantion on the subject. The backslash is just for escaping otherwise

Re: Issue w/ highlighting a String field

2010-03-23 Thread Markus Jelsma
highlighting is only working with tokenized fields, f.i., it worked with text and another type I defined. Is this true, or I'm making a mistake that is preventing me to have the highlighting option working on string? Thanks for your help. Markus Jelsma - Technisch Architect - Buyways BV http

Re: wikipedia and teaching kids search engines

2010-03-24 Thread Markus Jelsma
board with a sentence written largely on it, have the students physically *tokenize* the document by cutting it up and lexicographically building the term dictionary. Thoughts on taking it further welcome! Thanks all. Erik Markus Jelsma - Technisch Architect - Buyways BV http

Shards and (distributed) external file field

2012-01-09 Thread Markus Jelsma
Hi, Are there plans for bringing distributed capabilities for the external file field? I've not seem any hints for this in the work in distributed indexing, nor on the wiki or elsewhere. Will we be able to send a very large file and have it sliced up and have the values sent to the designated

[SolrCloud] Too many open files - internal server error

2012-02-29 Thread Markus Jelsma
Hi, We're doing some tests with the latest trunk revision on a cluster of five high-end machines. There is one collection, five shards and one replica per shard on some other node. We're filling the index from a MapReduce job, 18 processes run concurrently. This is plenty when indexing to a

Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Markus Jelsma
the errors were coming in but it did not exceed 11k at any given time. How did you check the number of filedescriptors used? Did you get this number from the system info handler (http://hotname:8983/solr/admin/system?indent=onwt=json) or somehow differently? -- Sami Siren -- Markus Jelsma

Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Markus Jelsma
On Wednesday 29 February 2012 17:52:55 Sami Siren wrote: On Wed, Feb 29, 2012 at 5:53 PM, Markus Jelsma markus.jel...@openindex.io wrote: Sami, As superuser: $ lsof | wc -l But, just now, i also checked the system handler and it told me: str name=ulimit(error executing: ulimit

Re: [SolrCloud] Too many open files - internal server error

2012-02-29 Thread Markus Jelsma
Thanks. They are set properly. But i misspelled the tomcat6 username in limits.conf :( On Wednesday 29 February 2012 18:08:55 Yonik Seeley wrote: On Wed, Feb 29, 2012 at 10:32 AM, Markus Jelsma markus.jel...@openindex.io wrote: The Linux machines have proper settings for ulimit and friends

[SoldCloud] leaking file descriptors

2012-03-01 Thread Markus Jelsma
Hi, Yesterday we had an issue with too many open files, which was solved because a username was misspelled. But there is still a problem with open files. We cannot succesfully index a few millions documents from MapReduce to a 5-node Solr cloud cluster. One of the problems is that after a

Re: [SoldCloud] leaking file descriptors

2012-03-01 Thread Markus Jelsma
a sneaky work-around :) Regards Bernd Am 01.03.2012 11:36, schrieb Markus Jelsma: Hi, Yesterday we had an issue with too many open files, which was solved because a username was misspelled. But there is still a problem with open files. We cannot succesfully index a few

Re: nutch log

2012-03-03 Thread Markus Jelsma
Looks like you have a bad value where a boolean is expected in your solrconfig.xml. On Sat, 3 Mar 2012 16:09:11 +0100, alessio crisantemi alessio.crisant...@gmail.com wrote: is true. this is the slr problem: mar 03, 2012 12:08:04 PM org.apache.solr.common.SolrException log Grave:

[SoldCloud] Slow indexing

2012-03-04 Thread Markus Jelsma
Hi, With auto-committing disabled we can now index many millions of documents in our test environment on a 5-node cluster with 5 shards and a replication factor of 2. The documents are uploaded from map/reduce. No significant changes were made to solrconfig and there are no update processors

Fwd: Re: [SoldCloud] Slow indexing

2012-03-04 Thread Markus Jelsma
...@openindex.io Reply-To: solr-user@lucene.apache.org hmm, loks like you are facing exactly the phenomena I asked about. See my question here: http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/61326 On Sun, Mar 4, 2012 at 9:24 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi

Re: [SoldCloud] Slow indexing

2012-03-05 Thread Markus Jelsma
On Sun, 4 Mar 2012 21:09:30 -0500, Mark Miller markrmil...@gmail.com wrote: On Mar 4, 2012, at 5:43 PM, Markus Jelsma wrote: everything stalls after it lists all segment files and that a ZK state change has occured. Can you get a stack trace here? I'll try to respond to more tomorrow. What

Re: [SoldCloud] Slow indexing

2012-03-05 Thread Markus Jelsma
On Mon, 5 Mar 2012 11:26:20 -0500, Mark Miller markrmil...@gmail.com wrote: On Mar 5, 2012, at 10:01 AM, dar...@ontrenet.com wrote: If one of those 10 indexing nodes goes down or falls out of sync and comes back, does ZK block the state of indexing until that single node catches back up?

Re: Highlighting a font without bold or italic modes

2012-03-13 Thread Markus Jelsma
I would first attempt to underline or assign another colour if the scheme allows for it before increasing font size. On Mon, 12 Mar 2012 20:50:15 -0700, Lance Norskog goks...@gmail.com wrote: How do you highlight terms in languages without boldface or italic modes? Maybe raise the text size a

Re: Too many open files - lots of sockets

2012-03-14 Thread Markus Jelsma
entries are? Happy to post any more information as needed. Cheers, Colin -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350

Re: Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2012-03-19 Thread Markus Jelsma
middle byte 0xe3 (at char #10, byte #-1). Could anyone tell me how to solve it? Thanks very much. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350

Re: possible spellcheck bug in 3.5 causing erroneous suggestions

2012-03-22 Thread Markus Jelsma
Can you try spellcheck.q ? On Thu, 22 Mar 2012 09:57:19 +0100, tom dev.tom.men...@gmx.net wrote: hi folks, i think i found a bug in the spellchecker but am not quite sure: this is the query i send to solr: http://lh:8983/solr/CompleteIndex/select? rows=0 echoParams=all spellcheck=true

Re: Commit Strategy for SolrCloud when Talking about 200 million records.

2012-03-23 Thread Markus Jelsma
. eg autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit -- -IC - Mark Miller lucidimagination.com -- Markus Jelsma - CTO - Openindex

Re: PageRank

2012-04-04 Thread Markus Jelsma
http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350

Re: Multi-words synonyms matching

2012-04-10 Thread Markus Jelsma
I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth -- Markus Jelsma - CTO - Openindex

Re: Securing Solr under Tomcat - IP best way?

2012-04-10 Thread Markus Jelsma
on this because there seems to be a diverse range of opinions on this. Regards, James -- View this message in context: http://lucene.472066.n3.nabble.com/Securing-Solr-under-Tomcat-IP-best-way- tp3899929p3899929.html Sent from the Solr - User mailing list archive at Nabble.com. -- Markus Jelsma

Re: Securing Solr under Tomcat - IP best way?

2012-04-10 Thread Markus Jelsma
Accept only what you need (ports incoming/outgoing) for specific trusted clients. Decide for protocols such as ICMP, DNS, NTP, SSH and of course HTTP and drop all other coming in and reject going out. Beyond this you can also configure some protection for bad packets. There are plenty of

Re: URP's versus Cloud

2012-04-10 Thread Markus Jelsma
the RNI URP to the chain for xml updates -- requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainRNI/str /lst /requestHandler -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600

Re: Lexical analysis tools for German language data

2012-04-12 Thread Markus Jelsma
you can search more successfully? Michael -- Markus Jelsma - CTO - Openindex

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Markus Jelsma
. It still yields strange results when it emits tokens that are subwords of a subword. paul -- Markus Jelsma - CTO - Openindex

Re: Removing old documents

2012-05-01 Thread Markus Jelsma
. -- Markus Jelsma - CTO - Openindex

1MB file to Zookeeper

2012-05-03 Thread Markus Jelsma
Hi, We've increased Zookeepers znode size limit to accomodate for some larger dictionaries and other files. It isn't the best idea to increase the maximum znode size. Any plans for splitting up larger files and storing them with multi? Does anyone have another suggestion? Thanks, Markus

Re: 1MB file to Zookeeper

2012-05-03 Thread Markus Jelsma
it autocompress files larger than N bytes? And how should we detect if data is compressed when reading from ZooKeeper? On Thursday 03 May 2012 14:04:31 Mark Miller wrote: On May 3, 2012, at 5:15 AM, Markus Jelsma wrote: Hi, We've increased Zookeepers znode size limit to accomodate for some

RE: How to change the default format for tstamp?

2012-05-08 Thread Markus Jelsma
? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-the-default-format-for-tstamp-tp3970751p3971251.html Sent from the Solr - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex

Re: How add custom field to Nutch1.4?

2012-05-14 Thread Markus Jelsma
Please ask Nutch related questions only on the Nutch users mailing list. Thanks. On Sun, 13 May 2012 20:18:37 -0700 (PDT), forwardswing wangweiz...@sohu.com wrote: who can help me ? -- View this message in context:

SolrCloud deduplication

2012-05-18 Thread Markus Jelsma
Hi, Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is not functional anymore. The problem is that documents are passed multiple times through the URP and the digest field is added as if it is an multi valued field. If the field is not multi valued you'll get this

RE: SolrCloud deduplication

2012-05-18 Thread Markus Jelsma
Hi, Interesting! I'm watching the issues and will test as soon as they are committed. Thanks! -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 18-May-2012 16:05 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io Subject: Re: SolrCloud

RE: SolrCloud deduplication

2012-05-18 Thread Markus Jelsma
you're right. I'll test the patch as soon as possible. Thanks! -Original message- From:Chris Hostetter hossman_luc...@fucit.org Sent: Fri 18-May-2012 18:20 To: solr-user@lucene.apache.org Subject: RE: SolrCloud deduplication : Interesting! I'm watching the issues and will

RE: SolrCloud deduplication

2012-05-21 Thread Markus Jelsma
Hi, SOLR-2822 seems to work just fine as long as the SignatureProcessor precedes the DistributedProcessor in the update chain. Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 18-May-2012 16:05 To: solr-user@lucene.apache.org; Markus Jelsma

RE: SolrCloud deduplication

2012-05-21 Thread Markus Jelsma
as the SignatureProcessor precedes the DistributedProcessor in the update chain. Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 18-May-2012 16:05 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io Subject: Re: SolrCloud

RE: SolrCloud deduplication

2012-05-21 Thread Markus Jelsma
:39 AM, Markus Jelsma wrote: Hi again, It seemed to work fine but in the end duplicates are not overwritten. We first run the SignatureProcessor and then the DistributedProcessor. If we do it the other way around the digest field receives multiple values and throws errors

SolrCloud indexing blocks if node is recovering

2012-11-02 Thread Markus Jelsma
Hi, We just tested indexing some million docs from Hadoop to a 10 node 2 rep SolrCloud cluster with this week's trunk. One of the nodes gave an OOM but indexing continued without interruption. When i restarted the node indexing stopped completely, the node tried to recover - which was

Possible memory leak in recovery

2012-11-02 Thread Markus Jelsma
Hi, We wiped clean the data directories for one node. That node is never able to recover and regularly runs OOM. On another cluster (with an older build, september 10th) memory consumption on recovery is fairly low when recoverign and with only a 250MB heap allocated it's easy to recover two

RE: trouble instantiating CloudSolrServer

2012-11-03 Thread Markus Jelsma
be out of whack? | | On Fri, Nov 2, 2012 at 6:38 AM, Markus Jelsma | markus.jel...@openindex.io wrote: | Hi, | | We use trunk but got SolrJ 4.0 from Maven. Creating an instance of | CloudSolrServer fails because its constructor calls a not existing | LBServer constructor, it attempts

RE: SolrCloud indexing blocks if node is recovering

2012-11-03 Thread Markus Jelsma
@lucene.apache.org Subject: Re: SolrCloud indexing blocks if node is recovering Doesn't sound right. Still have the logs? - Mark On Fri, Nov 2, 2012 at 9:45 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, We just tested indexing some million docs from Hadoop to a 10 node 2 rep

RE: trunk is unable to replicate between nodes ( Unable to download ... completely)

2012-11-05 Thread Markus Jelsma
of the trunk work around allowing any Directory impl to replicate. JIRA pls :) - Mark On Oct 30, 2012, at 12:29 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, We're testing again with today's trunk and using the new Lucene 4.1 format by default. When nodes are not restarted

RE: No lockType configured for NRTCachingDirectory

2012-11-05 Thread Markus Jelsma
file a JIRA to track looking into it. - Mark On Oct 31, 2012, at 11:30 AM, Markus Jelsma markus.jel...@openindex.io wrote: That's 5, the actual trunk/ -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wed 31-Oct-2012 16:29 To: solr-user@lucene.apache.org

RE: trouble instantiating CloudSolrServer

2012-11-05 Thread Markus Jelsma
CloudSolrServer I think the maven jars must be out of whack? On Fri, Nov 2, 2012 at 6:38 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, We use trunk but got SolrJ 4.0 from Maven. Creating an instance of CloudSolrServer fails because its constructor calls a not existing LBServer

RE: Continuous Ping query caused exception: java.util.concurrent.RejectedExecutionException

2012-11-06 Thread Markus Jelsma
1, 2012, at 5:39 AM, Markus Jelsma markus.jel...@openindex.io wrote: File bug? Please. - Mark

RE: SolrCloud indexing blocks if node is recovering

2012-11-06 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-4038 Still trying to gather the logs -Original message- From:Mark Miller markrmil...@gmail.com Sent: Sat 03-Nov-2012 14:17 To: Markus Jelsma markus.jel...@openindex.io Cc: solr-user@lucene.apache.org Subject: Re: SolrCloud indexing

positions and qf parameter in (e)dismax

2012-11-08 Thread Markus Jelsma
Hi, We do not want to store positions for some fields or omit term and positions (or just tf) for other fields. Obviously we don't need/want explicit phrase matching on the fields we want to configure without positions, but (e)dismax doesn't let us. All text fields configured in the QF

RE: Apache Nutch 1.5.1 + Apache Solr 4.0

2012-11-08 Thread Markus Jelsma
Hi, Your Nutch schema likely points to the old EnglishPorterFilter that doesn't exist anymore. You can change that occurance to PorterStemFilterFactory, that should fix the issue. -Original message- From:Antony Steiner ant.stei...@gmail.com Sent: Thu 08-Nov-2012 14:05 To:

RE: Apache Nutch 1.5.1 + Apache Solr 4.0

2012-11-08 Thread Markus Jelsma
change anything. Should I post the full stacktrace? Regards Antony 2012/11/8 Markus Jelsma markus.jel...@openindex.io Hi, Your Nutch schema likely points to the old EnglishPorterFilter that doesn't exist anymore. You can change that occurance to PorterStemFilterFactory

RE: Apache Nutch 1.5.1 + Apache Solr 4.0

2012-11-08 Thread Markus Jelsma
:54 To: Markus Jelsma markus.jel...@openindex.io; solr-user@lucene.apache.org Subject: Re: Apache Nutch 1.5.1 + Apache Solr 4.0 Hi, I just saw there is a schema-solr4.xml and a schema.xml in the nutch conf directory. But with both schemas I get the same errors when starting up solr. Heres

RE: positions and qf parameter in (e)dismax

2012-11-08 Thread Markus Jelsma
-Original Message- From: Markus Jelsma Sent: Thursday, November 08, 2012 5:01 AM To: solr-user@lucene.apache.org Subject: positions and qf parameter in (e)dismax Hi, We do not want to store positions for some fields or omit term and positions (or just tf) for other fields

Skewed IDF in multi lingual index

2012-11-08 Thread Markus Jelsma
Hi, We're testing a large multi lingual index with _LANG fields for each language and using dismax to query them all. Users provide, explicit or implicit, language preferences that we use for either additive or multiplicative boosting on the language of the document. However, additive boosting

RE: best practice for restarting the entire SolrCloud cluster

2012-11-08 Thread Markus Jelsma
Hi - i think you're seeing: https://issues.apache.org/jira/browse/SOLR-3993 -Original message- From:Bill Au bill.w...@gmail.com Sent: Thu 08-Nov-2012 21:16 To: solr-user@lucene.apache.org Subject: best practice for restarting the entire SolrCloud cluster I have a simple

RE: Skewed IDF in multi lingual index

2012-11-09 Thread Markus Jelsma
if you have any 3.x segments in your index. On Thu, Nov 8, 2012 at 11:13 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, We're testing a large multi lingual index with _LANG fields for each language and using dismax to query them all. Users provide, explicit or implicit

RE: How Index word document in solr.

2012-11-12 Thread Markus Jelsma
hi - Check the Extracting Request Handler manual: http://wiki.apache.org/solr/ExtractingRequestHandler -Original message- From:veena rani veenara...@gmail.com Sent: Mon 12-Nov-2012 10:09 To: solr-user@lucene.apache.org Subject: How Index word document in solr. Hi, Please

RE: Solr Indexing MAX FILE LIMIT

2012-11-13 Thread Markus Jelsma
Hi - instead of trying to make the system ingest such large files perhaps you can split the files in many small pieces. -Original message- From:mitra mitra.re...@ornext.com Sent: Tue 13-Nov-2012 09:05 To: solr-user@lucene.apache.org Subject: Solr Indexing MAX FILE LIMIT Hello

RE: Solr 4.0 indexing performance

2012-11-15 Thread Markus Jelsma
Hi - you're likely seeing a drop in performance because of durability which is enabled by default via a transaction log. When disabled 4.0 is iirc slightly faster than 3.x. -Original message- From:Nils Weinander nils.weinan...@gmail.com Sent: Thu 15-Nov-2012 10:35 To:

Report exception: too many close count

2012-11-16 Thread Markus Jelsma
Hi, I stumbled upon SOLR-4037 again and this time restarting with a clean Zookeeper gave a very interesting error log: 2012-11-16 11:05:51,876 INFO [solr.core.SolrCore] - [Thread-4] - : Closing SolrCoreState 2012-11-16 11:05:51,876 INFO [solr.update.DefaultSolrCoreState] - [Thread-4] - :

RE: consistency in SolrCloud replication

2012-11-16 Thread Markus Jelsma
Solr is provides availability and it is tolerant to partioning so that leaves consistency. It is eventual consistent. -Original message- From:Bill Au bill.w...@gmail.com Sent: Fri 16-Nov-2012 15:00 To: solr-user@lucene.apache.org Subject: Re: consistency in SolrCloud replication

Reduce QueryComponent prepare time

2012-11-16 Thread Markus Jelsma
Hi, We're seeing high prepare times for the QueryComponent, obviously due to the vast amount of field and queries. It's common to have a prepare time of 70-80ms while the process times drop significantly due to warmed searchers, OS cache etc. The prepare time is a recurring issue and i'd hope

RE: Reduce QueryComponent prepare time

2012-11-19 Thread Markus Jelsma
I'd also like to know which parts of the entire query constitute the prepare time and if it would matter significantly if we extend the edismax plugin and hardcode the parameters we pass into (reusable) objects. Thanks, Markus -Original message- From:Markus Jelsma

RE: Reduce QueryComponent prepare time

2012-11-20 Thread Markus Jelsma
, 2012 at 3:08 PM, Markus Jelsma markus.jel...@openindex.iowrote: I'd also like to know which parts of the entire query constitute the prepare time and if it would matter significantly if we extend the edismax plugin and hardcode the parameters we pass into (reusable) objects. Thanks

RE: Reduce QueryComponent prepare time

2012-11-21 Thread Markus Jelsma
. Nothing more useful from me. Bye. On Tue, Nov 20, 2012 at 7:01 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Profiling pointed me directly to the method i already suspected: ExtendedDismaxQParser.parse(). I added manual timers in parts of the method and made sure

Recip m parameter to take function value

2012-11-21 Thread Markus Jelsma
Hi, We need the recip function's m-parameter to take other functions e.g. recip(dateField, div(1,prod(1,2)), 1,1) but ValueSourceParser want to read a float instead. How could we modifiy either Solr or Lucene as well to take functions for that parameter? I've been looking at the various

RE: Solr UIMA with KEA

2012-11-22 Thread Markus Jelsma
See: http://nutch.apache.org/apidocs-2.1/org/apache/nutch/crawl/AdaptiveFetchSchedule.html -Original message- From:nutchsolruser nutchsolru...@gmail.com Sent: Fri 23-Nov-2012 06:53 To: solr-user@lucene.apache.org Subject: Solr UIMA with KEA Is there any way we can extract tags

RE: Solr UIMA with KEA

2012-11-22 Thread Markus Jelsma
Sorry, wrong list :) -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Fri 23-Nov-2012 08:32 To: solr-user@lucene.apache.org Subject: RE: Solr UIMA with KEA See: http://nutch.apache.org/apidocs-2.1/org/apache/nutch/crawl/AdaptiveFetchSchedule.html

RE: Spellchecker for multiple sites (and languages?)

2012-11-26 Thread Markus Jelsma
Hi - check the new spellchecker collate options. It limits spellchecker suggestions to the fq restrictions. If you filter on specific hosts, the spellchecker will only provide suggestions that are found in that host. Same goes for language.

RE: SolrCloud(5x) - Errors while recovering

2012-11-27 Thread Markus Jelsma
Seems you got this issue: https://issues.apache.org/jira/browse/SOLR-4032 -Original message- From:deniz denizdurmu...@gmail.com Sent: Tue 27-Nov-2012 05:04 To: solr-user@lucene.apache.org Subject: SolrCloud(5x) - Errors while recovering Here is briefly what is happening: I have

RE: positions and qf parameter in (e)dismax

2012-11-27 Thread Markus Jelsma
Hi - no we're not getting any errors because we enabled positions on all fields that are also listed in the qf-parameter. If we don't, and send a phrase query we would get an error such as: java.lang.IllegalStateException: field h1 was indexed without position data; cannot run PhraseQuery

RE: SolrCloud(5x) - Errors while recovering

2012-11-27 Thread Markus Jelsma
It only seems to happen if a node dies while indexing. -Original message- From:deniz denizdurmu...@gmail.com Sent: Tue 27-Nov-2012 10:34 To: solr-user@lucene.apache.org Subject: RE: SolrCloud(5x) - Errors while recovering another update having 300K docs causes the same error...

RE: Extreme index size reduction on 4.1-SNAPSHOT?

2012-11-27 Thread Markus Jelsma
Hi, please check this issue: https://issues.apache.org/jira/browse/LUCENE-4226 But it is enabled because of: https://issues.apache.org/jira/browse/LUCENE-4509 Since it's suddenly default you would have to completely wipe the index and reindex the data, at least i had to, because of numerous

RE: positions and qf parameter in (e)dismax

2012-11-28 Thread Markus Jelsma
a phrase and the field does not have positions, a BooleanQuery with MUST would be generated instead of the PhraseQuery. -- Jack Krupansky -Original Message- From: Markus Jelsma Sent: Tuesday, November 27, 2012 4:27 AM To: solr-user@lucene.apache.org Subject: RE: positions and qf

RE: Best way to increase boost to results that 'starts with' search keyword

2012-11-30 Thread Markus Jelsma
This issue adds the SpanFirstQuery to edismax. https://issues.apache.org/jira/browse/SOLR-3925 It unfortuntately cannot produce progressively higher boosts if the term is closer to the beginning. -Original message- From:Jack Krupansky j...@basetechnology.com Sent: Fri 30-Nov-2012

RE: Exceptions in branch_4x log

2012-11-30 Thread Markus Jelsma
Hi, try updating your check out, i think that's fixed now. https://issues.apache.org/jira/browse/SOLR-4117 -Original message- From:Shawn Heisey s...@elyograg.org Sent: Fri 30-Nov-2012 22:21 To: solr-user@lucene.apache.org Subject: Exceptions in branch_4x log This is branch_4x,

The shard called `properties`

2012-12-05 Thread Markus Jelsma
Hi, We're suddenly seeing a shard called `properties` in the cloud graph page when testing today's trunk with a clean Zookeeper data directory. Any idea where it comes from? We have not changed the solr.xml on any node. Thanks

<    1   2   3   4   5   6   7   8   9   10   >