Re: [CODE4LIB] MARC field lengths

2013-10-16 Thread Nicolas Franck
Are you familiar with OAI-PMH protocol? We have almost 2 miljoen records available over this protocol: http://search.ugent.be/meercat/x/oai?verb=ListRecordsmetadataPrefix=marcxml From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen Coyle

Re: [CODE4LIB] pdf2txt

2013-10-14 Thread Nicolas Franck
Could this also be done by Apache Tika? Or do I miss a crucial point? http://tika.apache.org/1.4/gettingstarted.html Apparently it has a command-line utility that extract metadata and content from various document formats, and prints it to the standard output. The output can then be supplied to

Re: [CODE4LIB] pdf2txt

2013-10-12 Thread Nicolas Franck
Some pdf's work, but there is one pdf I posted that result into this error: Software error: The file argument (/var/www/html/sandbox/pdf2txt/tmp/1381581065.txt) passed to this method is invalid: No such file or directory For help, please send mail to the webmaster (root@localhost), giving this

Re: [CODE4LIB] pdf2txt

2013-10-12 Thread Nicolas Franck
Of Eric Lease Morgan [emor...@nd.edu] Sent: Saturday, October 12, 2013 3:49 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] pdf2txt On Oct 12, 2013, at 8:31 AM, Nicolas Franck nicolas.fra...@ugent.be wrote: For a limited period of time I am making publicly available a Web-based program

[CODE4LIB] solr update json, boost for record

2013-10-08 Thread Nicolas Franck
-writer can produce (I'm using JSON from perl): { add: { boost:2.0 doc: {} }, add: { boost:2.0 doc: {} } } Any idea's? Thanks in advance! greetings, Nicolas Franck

Re: [CODE4LIB] solr computation field norm problem

2013-10-01 Thread Nicolas Franck
/lucene/misc/SweetSpotSimilarity.html Erik On Sep 26, 2013, at 8:02 AM, Nicolas Franck nicolas.fra...@ugent.be wrote: I've been testing with Solr 4 (Lucene 4) that uses the new DefaultSimilarity class. It does not use the encodeNorm and decodeNorm methods anymore that caused all

Re: [CODE4LIB] solr computation field norm problem

2013-09-26 Thread Nicolas Franck
: Jay Hill says fields with 3 terms and 4 terms both score at .5 in the lengthNorm. On Wed, Sep 25, 2013 at 4:21 PM, Nicolas Franck nicolas.fra...@ugent.bewrote: Hi there, I have a question about the way Lucene computes the length norm of field norm for its documents. My documents

Re: [CODE4LIB] solr computation field norm problem

2013-09-26 Thread Nicolas Franck
into this: http://lucene.472066.n3.nabble.com/field-length-normalization-tp495308p495311.html TL;DR: Jay Hill says fields with 3 terms and 4 terms both score at .5 in the lengthNorm. On Wed, Sep 25, 2013 at 4:21 PM, Nicolas Franck nicolas.fra...@ugent.bewrote: Hi there, I have a question

[CODE4LIB] solr computation field norm problem

2013-09-25 Thread Nicolas Franck
Hi there, I have a question about the way Lucene computes the length norm of field norm for its documents. My documents are indexed using Solr. These are the documents that where indexed (ignore 'score', that is not part of the document itself) doc float name=score1.00711/float str