HOWTO get a working copy of SOLR?

2010-06-15 Thread Bernd Fehling
Dear list, this sounds stupid, but how to get a full working copy of SOLR? What I have tried so far: - started with LucidWorks SOLR. Installs fine, runs fine but has an old tika version and can only handle some PDFs. - changed to SOLR trunk. Installs fine, runs fine but luke 1.0.1 argues

Re: HOWTO get a working copy of SOLR?

2010-06-16 Thread Bernd Fehling
Sixten Otto wrote: On Tue, Jun 15, 2010 at 12:58 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: - changed to SOLR branch_3x. Installs fine, runs fine, luke works fine but the extraction with /update/extract (ExtractingRequestHandler) only replies the metadata but not the content

Re: How to Debug Sol-Code in Eclipse ?!

2010-08-23 Thread Bernd Fehling
can nobody help me or want :D As already someone said: - install Eclipse - add Jetty Webapp Plugin to Eclipse - add svn plugin to Eclipse - download with svn the repository from trunk - change to lucene dir and run ant package - change to solr dir and run ant dist - setup with Run

Re: Different analyzers for dfferent documents in different languages?

2010-09-22 Thread Bernd Fehling
Actually, this is one of the biggest disadvantage of Solr for multilingual content. Solr is field based which means you have to know the language _before_ you feed the content to a specific field and process the content for that field. This results in having separate fields for each language.

Re: Migrating to Solr

2010-02-25 Thread Bernd Fehling
Hi list, is this true, no downloaded copy of the documentprocessor anywhere available? Regards, Bernd Bernd Fehling schrieb: Was anyone able to get a copy of: http://sesat.no/svn/sesat-documentprocessor/ Unfortunately it is offline. Would be pleased to get a copy. Regards, Bernd

Re: Query regarding solr custom sort order

2012-01-04 Thread Bernd Fehling
Hi, I suggest using the following fieldType for your field: fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ Regards Bernd Am 04.01.2012 14:40, schrieb umaswayam: Hi, We want to sort our records based on some sequence which is like 1 2 3 4 5 6 7 8 9 10

Re: Query regarding solr custom sort order

2012-01-06 Thread Bernd Fehling
Hi Uma, i don't understand what you're looking for. Do you need to sort on fields of type double with precision 2 or what? In your example you were talking about 1 2 3 4 5 6 7 8 9 10 11 12 13 14. Regards, Bernd Am 06.01.2012 07:11, schrieb umaswayam: Hi Bernd, The column which comes

exception while loading with DIH multi-threaded

2012-01-11 Thread Bernd Fehling
Hi list, after changing DIH to multi-theaded (4 threads) I get sometimes an exception. This is not always the case and I never had any problems with single-threaded at all. I'm using Solr 3.5 but also tried branch_3x (3.6) and could see this with both versions. Don't know why this comes up

Re: exception while loading with DIH multi-threaded

2012-01-11 Thread Bernd Fehling
After browsing through the issues it looks like something belonging to https://issues.apache.org/jira/browse/SOLR-2694 Am 11.01.2012 14:08, schrieb Bernd Fehling: Hi list, after changing DIH to multi-theaded (4 threads) I get sometimes an exception. This is not always the case and I never

Re: exception while loading with DIH multi-threaded

2012-01-11 Thread Bernd Fehling
Hi Mikhail, thanks for pointing me to the issue. Regards, Bernd Am 11.01.2012 21:47, schrieb Mikhail Khludnev: FYI, it's https://issues.apache.org/jira/browse/SOLR-2804 I'm trying to address it. On Wed, Jan 11, 2012 at 5:49 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: After

Re: Synonym configuration not working?

2012-01-15 Thread Bernd Fehling
Yes and No. If using Synonyms funtionality out of the box you have to do it at index time. But if using it at query time, like we do, you have to do some programming. We have connected a thesaurus which is actually using synonyms functionality at query time. There are some pitfalls to take care

SolrException with branch_3x

2012-01-31 Thread Bernd Fehling
On January 11th I downloaded branch_3x with svn into eclipse (indigo). Compiled and tested it without problems. Today I updated my branch_3x from repository. Compiled fine but get now SolrException when starting. Jan 31, 2012 1:50:15 PM org.apache.solr.core.SolrCore initListeners INFO: [] Added

SOLVED: SolrException with branch_3x

2012-01-31 Thread Bernd Fehling
After changing the below suggested lines and compiling the branch_3x runs fine now. SolrException is gone. Regards, Bernd Am 31.01.2012 14:21, schrieb Bernd Fehling: On January 11th I downloaded branch_3x with svn into eclipse (indigo). Compiled and tested it without problems. Today I updated

Re: usage of /etc/jetty.xml when debugging Solr in Eclipse

2012-02-08 Thread Bernd Fehling
Hi, run-jetty-run issue #9: ... In the VM Arguments of your launch configuration set -Drjrxml=./jetty.xml If jetty.xml is in the root of your project it will be used (you can also use a fully qualified path name). The UI port, context and WebApp dir are ignored, since you can define them in

Re: need to support bi-directional synonyms

2012-02-22 Thread Bernd Fehling
Use sprayer, washer http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory Regards Bernd Am 23.02.2012 07:03, schrieb remi tassing: Same question here... On Wednesday, February 22, 2012, geeky2gee...@hotmail.com wrote: hello all, i need to support the

Re: [SoldCloud] leaking file descriptors

2012-03-01 Thread Bernd Fehling
What is netstat telling you about the connections on the servers? Any connections in CLOSE_WAIT (passive close) hanging? Saw this on my servers last week. Used a little proggi to spoof a local connection on those servers ports and was able to fake the TCP-stack to close those connections. It

CLOSE_WAIT connections

2012-03-27 Thread Bernd Fehling
Hi list, I have looked into the CLOSE_WAIT problem and created an issue with a patch to fix this. A search for CLOSE_WAIT shows that there are many Apache projects hit by this problem. https://issues.apache.org/jira/browse/SOLR-3280 Can someone recheck the patch (it belongs to SnapPuller)

Re: [Announce] Solr 4.0 with RankingAlgorithm 1.4.1, NRT now supports both RankingAlgorithm and Lucene

2012-03-29 Thread Bernd Fehling
Nothing against RankingAlgorithm and your work, which sounds great, but I think that YOUR Solr 4.0 might confuse some Solr users and/or newbees. As far as I know the next official release will be 3.6. So your Solr 4.0 is a trunk snapshot or what? If so, which revision number? Or have you done

Re: solr 3.5 taking long to index

2012-04-12 Thread Bernd Fehling
There were some changes in solrconfig.xml between solr3.1 and solr3.5. Always read CHANGES.txt when switching to a new version. Also helpful is comparing both versions of solrconfig.xml from the examples. Are you sure you need a MaxPermSize of 5g? Use jvisualvm to see what you really need. This

Re: Lexical analysis tools for German language data

2012-04-12 Thread Bernd Fehling
You might have a look at: http://www.basistech.com/lucene/ Am 12.04.2012 11:52, schrieb Michael Ludwig: Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a

Re: Lexical analysis tools for German language data

2012-04-12 Thread Bernd Fehling
Paul, nearly two years ago I requested an evaluation license and tested BASIS Tech Rosette for Lucene Solr. Was working excellent but the price much much to high. Yes, they also have compound analysis for several languages including German. Just configure your pipeline in solr and setup the

HowTo getDefaultOperator with solr3.6?

2012-04-16 Thread Bernd Fehling
I'm trying to get the default operator of a schema in solr 3.6 but unfortunately everything is deprecated. The API solr 3.6 says: getQueryParserDefaultOperator() - Method in class org.apache.solr.schema.IndexSchema Deprecated. use getSolrQueryParser().getDefaultOperator()

Problems with edismax parser and solr3.6

2012-04-18 Thread Bernd Fehling
I just looked through my logs of solr 3.6 and saw several 0 hits which were not seen with solr 3.5. While tracing this down it turned out that edismax don't like queries of type ...q=(text:ide)... any more. If parentheses around the query term the edismax fails with solr 3.6. Can anyone

debugging junit test with eclipse

2012-04-24 Thread Bernd Fehling
I have tried all hints from internet for debugging a junit test of solr 3.6 under eclipse but didn't succeed. eclipse and everything is running, compiling, debugging with runjettyrun. Tests have no errors. Ant from command line ist also running with ivy, e.g. ant -Dtestmethod=testUserFields

Re: Multi-words synonyms matching

2012-05-15 Thread Bernd Fehling
-- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de33615 Bielefeld BASE

Re: Out Of Memory =( Too many cores on one server?

2012-11-16 Thread Bernd Fehling
I guess you should give JVM more memory. When starting to find a good value for -Xmx I oversized and set it to Xmx20G and Xms20G. Then I monitored the system and saw that JVM is between 5G and 10G (java7 with G1 GC). Now it is finally set to Xmx11G and Xms11G for my system with 1 core and 38

Re: error opening index solr 4.0 with lukeall-4.0.0-ALPHA.jar

2012-11-18 Thread Bernd Fehling
) at java.awt.EventDispatchThread.run(EventDispatchThread.java:122) o any ideas? I,ve created another index with lucene 4.0 and this luke open the index well. thanks in advance -- * Bernd FehlingBielefeld University

Re: error opening index solr 4.0 with lukeall-4.0.0-ALPHA.jar

2012-11-19 Thread Bernd Fehling
I just downloaded, compiled and opened an optimized solr 4.0 index in read only without problems. Could browse through the docs, search with different analyzers, ... Looks good. Am 19.11.2012 08:49, schrieb Toke Eskildsen: On Mon, 2012-11-19 at 08:10 +0100, Bernd Fehling wrote: I think

Re: Multi word synonyms

2012-11-29 Thread Bernd Fehling
There are also other solutions: Multi-word synonym filter (synonym expansion) https://issues.apache.org/jira/browse/LUCENE-4499 Since Solr 3.4 i have my own solution which might be obsolete if LUCENE-4499 will be in a released version.

DefaultSolrParams ?

2012-11-30 Thread Bernd Fehling
Dear list, after going from 3.6 to 4.0 I see exceptions in my logs. It turned out that somehow the q-parameter was empty. With 3.6 the q.alt in the solrconfig.xml worked as fallback but now with 4.0 I get exceptions. I use it like this: SolrParams params = req.getParams(); String q =

Re: DefaultSolrParams ?

2012-12-02 Thread Bernd Fehling
Hi Hoss, my config has definately not changed and it worked with 3.6 and 3.6.1. Yes I have a custom plugin and if q was empty with 3.6 it picked automatically q.alt from solrconfig.xml. This all was done with params.get() With 4.x this is gone due to some changes in DefaultSolrParams(?). Which is

Re: OutOfMemoryError | While Faceting Query

2012-12-07 Thread Bernd Fehling
Hi Uwe, sorting should be well prepared. First rough check is fieldCache. You can see it with SolrAdmin Stats. The insanity_count there should be 0 (zero). Only sort on fields which are prepared for sorting and make sense to be sorted. Do only faceting on fields which make sense. I've seen

Re: jconsole over jmx - should threads be visible?

2012-12-19 Thread Bernd Fehling
Hi Shawn, actually I use munin for monitoring but just checked with jvisualvm which also runs fine for remote monitoring. You might try the following: http://www.codefactorycr.com/java-visualvm-to-profile-a-remote-server.html You have to: - generate a policy file on the server to be monitored -

thanks for solr 4.1

2013-01-29 Thread Bernd Fehling
Now this must be said, thanks for solr 4.1 (and lucene 4.1)! Great improvements compared to 4.0. After building the first 4.1 index I thought the index was broken, but had no error messages anywhere. Why I thought it was damaged? The index size went down from 167 GB (solr 4.0) to 115 GB (solr

Solr4.1 changing result order FIFO to LIFO

2013-01-31 Thread Bernd Fehling
Hi list, I recognized that the result order is FIFO if documents have the same score. I think this is due to the fact that documents which are indexed later get a higher internal document ID and the output for documents with the same score starts with the lowest internal document ID and raises.

expert question about SolrReplication

2013-02-01 Thread Bernd Fehling
A question to the experts, why is the replicated index copied from its temporary location (index.x) to the real index directory and NOT moved? Copying over 100s of gigs takes some time, moving is just changing the file system link. Also, instead of first deleting the old index, why not

Re: expert question about SolrReplication

2013-02-03 Thread Bernd Fehling
Am 02.02.2013 03:48, schrieb Yonik Seeley: On Fri, Feb 1, 2013 at 4:13 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: A question to the experts, why is the replicated index copied from its temporary location (index.x) to the real index directory and NOT moved

replication problems with solr4.1

2013-02-11 Thread Bernd Fehling
Hi list, after upgrading from solr4.0 to solr4.1 and running it for two weeks now it turns out that replication has problems and unpredictable results. My installation is single index 41 mio. docs / 115 GB index size / 1 master / 3 slaves. - the master builds a new index from scratch once a week

Re: replication problems with solr4.1

2013-02-12 Thread Bernd Fehling
to slave, more like a sync? Am 11.02.2013 09:29, schrieb Bernd Fehling: Hi list, after upgrading from solr4.0 to solr4.1 and running it for two weeks now it turns out that replication has problems and unpredictable results. My installation is single index 41 mio. docs / 115 GB index size

Re: replication problems with solr4.1

2013-02-13 Thread Bernd Fehling
OK then index generation and index version are out of count when it comes to verify that master and slave index are in sync. What else is possible? The strange thing is if master is 2 or more generations ahead of slave then it works! With your logic the slave must _always_ be one generation

Re: Slaves always replicate entire index Index versions

2013-02-27 Thread Bernd Fehling
May be the info about index version is pulled from the repeaters data/replication.properties file and the content of that file is wrong. Had something similar and only solution for me was deleting the replication.properties file. But no guarantee about this. Actually the replication is pretty

Re: how often do you boys restart your tomcat?

2011-07-27 Thread Bernd Fehling
Till now I used jetty and got 2 week as the longest uptime until OOM. I just switched to tomcat6 and will see how that one behaves but I think its not a problem of the servlet container. Solr is pretty unstable if having a huge database. Actually this can't be blamed directly to Solr it is a

Re: how often do you boys restart your tomcat?

2011-07-27 Thread Bernd Fehling
It is definately Lucenes fieldCache making the trouble. Restart your solr and monitor it with jvisualvm, especially OldGen heap. When it gets to 100 percent filled use jmap to dump heap of your system. Then use Eclipse Memory Analyzer http://www.eclipse.org/mat/ and open the heap dump. You will

segment.gen file is not replicated

2011-07-29 Thread Bernd Fehling
Dear list, is there a deeper logic behind why the segment.gen file is not replicated with solr 3.2? Is it obsolete because I have a single segment? Regards, Bernd

Re: Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread Bernd Fehling
Any JAVA_OPTS set? Do not use -XX:+OptimizeStringConcat or -XX:+AggressiveOpts flags. Am 02.08.2011 12:01, schrieb alexander sulz: Hello folks, I'm using the latest stable Solr release - 3.3 and I encounter strange phenomena with it. After about 19 hours it just crashes, but I can't find

performance crossover between single index and sharding

2011-08-02 Thread Bernd Fehling
Is there any knowledge on this list about the performance crossover between a single index and sharding and when to change from a single index to sharding? E.g. if index size is larger than 150GB and num of docs is more than 25 mio. then it is better to change from single index to sharding and

Re: performance crossover between single index and sharding

2011-08-03 Thread Bernd Fehling
On 02.08.2011 21:00, Shawn Heisey wrote: ... I did try some early tests with a single large index. Performance was pretty decent once it got warmed up, but I was worried about how it would perform under a heavy load, and how it would cope with frequent updates. I never really got very far

Re: performance crossover between single index and sharding

2011-08-04 Thread Bernd Fehling
: Replies inline. On 8/3/2011 2:24 AM, Bernd Fehling wrote: To show that I compare apples and oranges here are my previous FAST Search setup: - one master server (controlling, logging, search dispatcher) - six index server (4.25 mio docs per server, 5 slices per index) (searching and indexing

Re: segment.gen file is not replicated

2011-08-04 Thread Bernd Fehling
I have now updated to solr 3.3 but segment.gen is still not replicated. Any idea why, is it a bug or a feature? Should I write a jira issue for it? Regards Bernd Am 29.07.2011 14:10, schrieb Bernd Fehling: Dear list, is there a deeper logic behind why the segment.gen file is not replicated

Re: segment.gen file is not replicated

2011-08-04 Thread Bernd Fehling
and then replicated, but segment.gen was not replicated. Due to your explanation NFS could not be reliable any more. So my idea either a bug or a feature and the experts will know :-) Regards Bernd Mike McCandless http://blog.mikemccandless.com On Thu, Aug 4, 2011 at 3:38 AM, Bernd Fehling bernd.fehl

Re: performance crossover between single index and sharding

2011-08-04 Thread Bernd Fehling
, and not a 32 bit Java? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com -Original Message- From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] Sent: Thursday, August 04, 2011 2:39 AM To: solr-user

string cut-off filter?

2011-08-08 Thread Bernd Fehling
Hi list, is there a string cut-off filter to limit the length of a KeywordTokenized string? So the string should not be dropped, only limitited to a certain length. Regards Bernd

Re: string cut-off filter?

2011-08-09 Thread Bernd Fehling
of a KeywordTokenized string? So the string should not be dropped, only limitited to a certain length. Regards Bernd -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr

question about query parsing

2011-08-09 Thread Bernd Fehling
Hi list, while searching with debug on I see strange query parsing: str name=rawquerystringidentifier:ub.uni-bielefeld.de/str str name=querystringidentifier:ub.uni-bielefeld.de/str str name=parsedquery +MultiPhraseQuery(identifier:(ub.uni-bielefeld.de ub) uni bielefeld de) /str str

Re: Is optimize needed on slaves if it replicates from optimized master?

2011-08-10 Thread Bernd Fehling
From what I see on my slaves, yes. After replication has finished and new index is in place and new reader has started I have always a write.lock file in my index directory on slaves, even though the index on master is optimized. Regards Bernd Am 10.08.2011 09:12, schrieb Pranav Prakash:

Re: Is optimize needed on slaves if it replicates from optimized master?

2011-08-10 Thread Bernd Fehling
Sure there is actually no optimizing on the slave needed, but after calling optimize on the slave the write.lock will be removed. So why is the replication process not doing this? Regards Bernd Am 10.08.2011 10:57, schrieb Shalin Shekhar Mangar: On Wed, Aug 10, 2011 at 1:11 PM, Bernd Fehling

Re: Solr 3.3 crashes after ~18 hours?

2011-08-11 Thread Bernd Fehling
-- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)Universitätsstr. 25 Tel. +49 521 106-4060 Fax. +49 521 106-4052 bernd.fehl...@uni-bielefeld.de

sorting issue with solr 3.3

2011-08-12 Thread Bernd Fehling
It turned out that there is a sorting issue with solr 3.3. As fas as I could trace it down currently: 4 docs in the index and a search for *:* sorting on field dccreator_sort in descending order

Re: sorting issue with solr 3.3

2011-08-13 Thread Bernd Fehling
The issue was located in a 31 million docs index and i have already reduced it to a reproducable 4 documents index. It is stock solr 3.3.0. Yes, the documents are also in the wrong order as the field sort values. Just added only the field sort values to the email to keep it short. I will produce a

Re: sorting issue with solr 3.3

2011-08-15 Thread Bernd Fehling
I have created an issue with test attached. https://issues.apache.org/jira/browse/SOLR-2713 Will try to figure out whats going wrong. Regards Bernd http://www.base-search.net/ Am 13.08.2011 16:20, schrieb Bernd Fehling: The issue was located in a 31 million docs index and i have already

commit to jira and change Status and Resolution

2011-09-01 Thread Bernd Fehling
Hi list, I have fixed an issue and created a patch (SOLR-2726) but how to change Status and Resolution in jira? And how to commit this, any idea? Regards, Bernd

Re: Unable to generate trace

2011-09-08 Thread Bernd Fehling
How about using jmap or jvisualvm? Or even connecting with eclipse to the process for live analysis? Am 08.09.2011 11:07, schrieb Rohit: Nope not getting anything here also. Regards, Rohit -Original Message- From: Jerry Li [mailto:zongjie...@gmail.com] Sent: 08 September 2011 08:09

skipping parts of query analysis for some queries

2011-09-30 Thread Bernd Fehling
I'm in the need of skipping some query analysis steps for some queries. Or more precisely, make it switchable with a query parameter. Use case: fieldType name=text_spec class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index charFilter

accessing the query string from inside TokenFilter

2011-10-25 Thread Bernd Fehling
Dear list, while writing some TokenFilter for my analyzer chain I need access to the query string from inside of my TokenFilter for some comparison, but the Filters are working with a TokenStream and get seperate Tokens. Currently I couldn't get any access to the query string. Any idea how to

Report about Solr and multilingual Thesaurus

2011-11-21 Thread Bernd Fehling
Dear list, just in case you are planning to integrate or combine a thesaurus with Solr the following report might help you. BASE - Solr and the multilingual EuroVoc Thesaurus http://www.ub.uni-bielefeld.de/~befehl/base/solr/eurovoc.html In brief: It explains how a working solution is possible

Re: cache monitoring tools?

2011-12-08 Thread Bernd Fehling
Hi Otis, I can't find the download for the free SPM. What Hardware and OS do I need for installing SPM to monitor my servers? Regards Bernd Am 07.12.2011 18:47, schrieb Otis Gospodnetic: Hi Dmitry, You should use SPM for Solr - it exposes all Solr metrics and more (JVM, system info, etc.)

KStemmer for Solr

2010-10-11 Thread Bernd Fehling
Because I'm using solr from trunk and not from lucid imagination I was missing KStemmer. So I decided to add this stemmer to my installation. After some modifications KStemmer is now working fine as stand-alone. Now I have a KStemmerFilter. Next will be to write the KStemmerFilterFactory. I

DIH delta-import question

2010-10-15 Thread Bernd Fehling
Dear list, I'm trying to delta-import with datasource FileDataSource and processor FileListEntityProcessor. I want to load only files which are newer than dataimport.properties - last_index_time. It looks like that newerThan=${dataimport.last_index_time} is without any function. Can it be that

Re: How to use polish stemmer - Stempel - in schema.xml?

2010-10-28 Thread Bernd Fehling
Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write it as FilterFactory and use it as Filter like: filter

Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-02 Thread Bernd Fehling
? :) Cheers, Jakub Godawa. 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write

Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-02 Thread Bernd Fehling
res/tables: readme.txt stemmer_1000.out stemmer_100.out stemmer_2000.out stemmer_200.out stemmer_500.out stemmer_700.out 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, if you unzip your stempel-1.0.jar do you have the required directory structure and file

result of filtered field not indexed

2010-11-23 Thread Bernd Fehling
Dear list, solr/lucene has a strange problem. I'm currently using apache-solr-4.0-2010-10-12_08-05-48 I have written a MessageDigest for fields which generally works. Part of my schema.xml is: ... fieldType name=text_md class=solr.TextField analyzer type=index tokenizer

Re: result of filtered field not indexed

2010-11-24 Thread Bernd Fehling
Hi Rita, thanks for the advice, one problem solved. source start,end is now set to the correct value by the filter. After further debugging it looks like this is a bug in Lucene indexer. I wonder that noone ever noticed this... Kind regards, Bernd Am 23.11.2010 09:07, schrieb Bernd Fehling

Re: question about Solr SignatureUpdateProcessorFactory

2010-11-29 Thread Bernd Fehling
29.11.2010 14:30, schrieb Bernd Fehling: Dear list, a question about Solr SignatureUpdateProcessorFactory: for (String field : sigFields) { SolrInputField f = doc.getField(field); if (f != null) { *sig.add(field); Object o = f.getValue(); if (o instanceof String

Re: question about Solr SignatureUpdateProcessorFactory

2010-11-29 Thread Bernd Fehling
Am 29.11.2010 14:55, schrieb Markus Jelsma: On Monday 29 November 2010 14:51:33 Bernd Fehling wrote: Dear list, another suggestion about SignatureUpdateProcessorFactory. Why can I make signatures of several fields and place the result in one field but _not_ make a signature of one field

Re: question about Solr SignatureUpdateProcessorFactory

2010-11-30 Thread Bernd Fehling
As mentioned, in the typical case it's important that the field names be included in the signature, but i imagine there would be cases where you wouldn't want them included (like a simple concat Signature for building basic composite keys) I think the Signature API could definitely be

Re: Creating Email Token Filter

2010-11-30 Thread Bernd Fehling
Am 30.11.2010 10:56, schrieb Greg Smith: Hi, I have written a plugin to filter on email types and keep those tokens, however when I run it in the analysis in the admin it all works fine. But when I use the data import handler to import the data and set the field type it doesn't remove

Re: Dataimport performance

2010-12-15 Thread Bernd Fehling
. Are we wrong with that assumption, or do people experience similar import times with this amount of data to be imported? thanks! -robert -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH

names of index files

2011-01-02 Thread Bernd Fehling
Dear list, some questions about the names of the index files. With an older Solr 4.x version from trunk my index looks like: _2t1.fdt _2t1.fdx _2t1.fnm _2t1.frq _2t1.nrm _2t1.prx _2t1.tii _2t1.tis segments_2 segments.gen With a most recent version from trunk it looks like: _3a9.fdt _3a9.fdx

Re: WARNING: re-index all Lucene trunk indices

2011-01-05 Thread Bernd Fehling
Because this is also posted for solr-user and from some earlier experiences with solr from trunk I think this is also recommended for solr users living from trunk, right? So solr trunk builds directly with lucene trunk? Bernd Am 05.01.2011 11:55, schrieb Michael McCandless: If you are using

DIH load only selected documents with XPathEntityProcessor

2011-01-06 Thread Bernd Fehling
Hello list, is it possible to load only selected documents with XPathEntityProcessor? While loading docs I want to drop/skip/ignore documents with missing URL. Example: documents document titlefirst title/title ididentifier_01/id

DIH Transformer

2011-01-07 Thread Bernd Fehling
Hi list, currently the Transformers return row but can I skip or drop a row from the Transformer? If so, what should I return in that case, an empty row? Regards, Bernd

Re: DIH load only selected documents with XPathEntityProcessor

2011-01-10 Thread Bernd Fehling
Hi Gora, thanks a lot, very nice solution, works perfectly. I will dig more into ScriptTransformer, seems to be very powerful. Regards, Bernd Am 08.01.2011 14:38, schrieb Gora Mohanty: On Fri, Jan 7, 2011 at 12:30 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Hello list

strange SOLR behavior with required field attribute

2011-01-10 Thread Bernd Fehling
Dear list, while trying different options with DIH and SciptTransformer I also tried using the required=true option for a field. I have 3 records: documents document titlefirst title/title ididentifier_01/id linkhttp://www.foo.com/path/bar.html/link /document

Re: strange SOLR behavior with required field attribute

2011-01-10 Thread Bernd Fehling
Hi Koji, I'm using apache-solr-4.0-2010-11-24_09-25-17 from trunk. A grep for SOLR-1973 in CHANGES.txt says that it should have been fixed. Strange... Regards, Bernd Am 10.01.2011 16:14, schrieb Koji Sekiguchi: (11/01/10 23:26), Bernd Fehling wrote: Dear list, while trying different

LukeRequestHandler histogram?

2011-01-14 Thread Bernd Fehling
Dear list, what is the LukeRequestHandler histogram telling me? Couldn't find any explanation and would be pleased to have it explained. Many thanks in advance, Bernd

Re: LukeRequestHandler histogram?

2011-01-14 Thread Bernd Fehling
:15 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Dear list, what is the LukeRequestHandler histogram telling me? Couldn't find any explanation and would be pleased to have it explained. Many thanks in advance, Bernd

Re: DIH with full-import and cleaning still keeps old index

2011-01-20 Thread Bernd Fehling
Looks like this is a bug and I should write a jira issue for it? Regards Bernd Am 20.01.2011 11:30, schrieb Bernd Fehling: Hi list, after sending full-import=trueclean=truecommit=true Solr 4.x (apache-solr-4.0-2010-11-24_09-25-17) responds with: - DataImporter doFullImport

Re: DIH with full-import and cleaning still keeps old index

2011-01-23 Thread Bernd Fehling
. Try it out with additional parameter optimize=true - Espen On Thu, Jan 20, 2011 at 11:30 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Hi list, after sending full-import=trueclean=truecommit=true Solr 4.x (apache-solr-4.0-2010-11-24_09-25-17) responds with: - DataImporter

Re: DIH with full-import and cleaning still keeps old index

2011-01-23 Thread Bernd Fehling
:12, schrieb Espen Amble Kolstad: I think optimize only ever gets done when either a full-import or delta-import is done. You could optimize the normal way though see: http://wiki.apache.org/solr/UpdateXmlMessages - Espen On Mon, Jan 24, 2011 at 8:05 AM, Bernd Fehling bernd.fehl...@uni

solr admin result page error

2011-02-11 Thread Bernd Fehling
Dear list, after loading some documents via DIH which also include urls I get this yellow XML error page as search result from solr admin GUI after a search. It says XML processing error not well-formed. The code it argues about is: arr name=dcurls strhttp://eprints.soton.ac.uk/43350//str

Re: solr admin result page error

2011-02-11 Thread Bernd Fehling
, schrieb Bernd Fehling: Dear list, after loading some documents via DIH which also include urls I get this yellow XML error page as search result from solr admin GUI after a search. It says XML processing error not well-formed. The code it argues about is: arr name=dcurls strhttp

Re: solr admin result page error

2011-02-11 Thread Bernd Fehling
February 2011 08:59:27 Bernd Fehling wrote: Dear list, after loading some documents via DIH which also include urls I get this yellow XML error page as search result from solr admin GUI after a search. It says XML processing error not well-formed. The code it argues about is: arr name=dcurls

Re: solr admin result page error

2011-02-25 Thread Bernd Fehling
the JSONResponseWriter. On Friday 11 February 2011 15:45:23 Bernd Fehling wrote: Hi Markus, yes it looks like the same issue. There is also a \u utf8-code in your dump. Till now I followed it into XMLResponseWriter. Some steps before the result in a buffer looks good and the utf8-code

Content-Type of XMLResponseWriter / QueryResponseWriter

2011-03-03 Thread Bernd Fehling
Dear list, is there any deeper logic behind the fact that XMLResponseWriter is sending CONTENT_TYPE_XML_UTF8=application/xml; charset=UTF-8 ? I would assume (and also most browser) that for XML Output to receive text/xml and not application/xml. Or do you want the browser to call and XML-Editor

Re: Content-Type of XMLResponseWriter / QueryResponseWriter

2011-03-03 Thread Bernd Fehling
, 2011, at 7:30 AM, Bernd Fehling wrote: Dear list, is there any deeper logic behind the fact that XMLResponseWriter is sending CONTENT_TYPE_XML_UTF8=application/xml; charset=UTF-8 ? I would assume (and also most browser) that for XML Output to receive text/xml and not application/xml

from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Bernd Fehling
Is there a way to have a kind of casting for copyField? I have author names in multiValued string field and need a sorting on it, but sort on field is only for multiValued=false. I'm trying to get multiValued content from one field to a non-multiValued text or string field for sorting. And

Re: from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Bernd Fehling
and make author multiValued and author_sort a string field? Regards Bernd Am 17.03.2011 15:39, schrieb Gora Mohanty: On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Is there a way to have a kind of casting for copyField? I have author names

Re: from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Bernd Fehling
moved this from FAST index-profile to Solr DIH and placed the seperator there. But now I'm looking for a solution for VuFind. Easiest thing would be to have a kind of casting, may be for copyField. Regards, Bernd Am 17.03.2011 15:58, schrieb Yonik Seeley: On Thu, Mar 17, 2011 at 10:34 AM, Bernd

Re: from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Bernd Fehling
a string field? Regards Bernd Am 17.03.2011 15:39, schrieb Gora Mohanty: On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Is there a way to have a kind of casting for copyField? I have author names in multiValued string field and need a sorting

  1   2   3   4   >