SV: DIH for TikaEntityProcessor

2018-10-12 Thread Martin Frank Hansen (MHQ)
1.png@01D383C9.6C129A60] > > > Lautrupparken 40-42, DK-2750 Ballerup > E-mail m...@kmd.dk Web www.kmd.dk > Mobil +4525571418 > > > > *Fra:* Martin Frank Hansen (MHQ) > *Sendt:* 10. oktober 2018 10:15 > *Til:* solr-user > *Emne:* DIH for TikaEntityProcessor > &g

SV: DIH for TikaEntityProcessor

2018-10-12 Thread Martin Frank Hansen (MHQ)
schema). > I used the default config, and Solr version 7.5.0; I was able to > import the data just fine (I also tested with .*DOC). Is there any > other information you can provide that can help me reproduce this error? > > > > > On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hanse

SV: Tesseract language

2018-10-22 Thread Martin Frank Hansen (MHQ)
u want configuration over custom code, you > could look at something like Apache NiFI. It can push data into Solr. > Obviously it is a bigger solution, but it is correspondingly more > robust too. > > Regards, >Alex. > On Sun, 21 Oct 2018 at 11:07, Martin Frank Hansen (MHQ

SV: Tesseract language

2018-10-22 Thread Martin Frank Hansen (MHQ)
dingly more > robust too. > > Regards, >Alex. > On Sun, 21 Oct 2018 at 11:07, Martin Frank Hansen (MHQ) > wrote: > > > > Hi Alexandre, > > > > Thanks for your reply. > > > > Yes right now it is just for testing the possibilities of Solr and > Tes

SV: Tesseract language

2018-10-21 Thread Martin Frank Hansen (MHQ)
1:07, Martin Frank Hansen (MHQ) wrote: > > Hi Alexandre, > > Thanks for your reply. > > Yes right now it is just for testing the possibilities of Solr and Tesseract. > > I will take a look at the Tika documentation to see if I can make it work. > > You said that DIH are no

SV: Tesseract language

2018-10-21 Thread Martin Frank Hansen (MHQ)
not sure you can pass parseContext that way and DIH is also not recommended for production. I hope this helps, Alex. On Sun, 21 Oct 2018 at 09:24, Martin Frank Hansen (MHQ) wrote: > Hi again, > > > > Is there anyone who has some experience of using Tesseract’s OCR >

SV: Tesseract language

2018-10-21 Thread Martin Frank Hansen (MHQ)
2750 Ballerup E-mail m...@kmd.dk<mailto:m...@kmd.dk> Web www.kmd.dk<http://www.kmd.dk/> Mobil +4525571418 Fra: Martin Frank Hansen (MHQ) Sendt: 18. oktober 2018 13:30 Til: solr-user@lucene.apache.org Emne: Tesseract language Hi, I have been trying to use Tesseract through the dat

Tesseract language

2018-10-18 Thread Martin Frank Hansen (MHQ)
Hi, I have been trying to use Tesseract through the data-import-handler in Solr and it actually works very well – with English. As the documents are in Danish, I need to change the language setting in Tesseract to Danish as well, is that possible from Solr? I was using the

SV: DIH for TikaEntityProcessor

2018-10-12 Thread Martin Frank Hansen (MHQ)
md.dk<http://www.kmd.dk/> Mobil +4525571418 Fra: Martin Frank Hansen (MHQ) Sendt: 10. oktober 2018 10:15 Til: solr-user Emne: DIH for TikaEntityProcessor Hi, I am trying to read documents from a file system into Solr, using dataimporthandler but keep getting the following errors: [cid:image

RE: Tesseract language

2018-10-26 Thread Martin Frank Hansen (MHQ)
ut.println(handler.toString()); } Hope that someone can help here. -Original Message----- From: Martin Frank Hansen (MHQ) Sent: 22. oktober 2018 07:58 To: solr-user@lucene.apache.org Subject: SV: Tesseract language Hi Erick, Thanks for the help! I will take a look at it. Martin Frank Hansen, S

RE: Reading data using Tika to Solr

2018-10-26 Thread Martin Frank Hansen (MHQ)
attachment exceptions. On Fri, Oct 26, 2018 at 6:25 AM Martin Frank Hansen (MHQ) wrote: > Hi again, > > Never mind, I got manage to get the content of the msg-files as well > using the following link as inspiration: > https://wiki.apache.org/tika/RecursiveMetadata > > But thanks ag

RE: Tesseract language

2018-10-28 Thread Martin Frank Hansen (MHQ)
27, 2018 at 12:39 AM Martin Frank Hansen (MHQ) > > wrote: > > > Hi Rohan, > > > > Thanks for your reply, are you using tess4j with Tika or on its own? > > I will take a look at tess4j if I can't make it work with Tika alone. > > > > Best regards >

RE: Merging data from different sources

2018-10-31 Thread Martin Frank Hansen (MHQ)
rging data from different sources > > Maybe > https://lucene.apache.org/solr/guide/7_5/update-request-processors.htm > l#atomicupdateprocessorfactory > > Regards, > Alex > > On Tue, Oct 30, 2018, 7:57 AM Martin Frank Hansen (MHQ), wrote: > > > Hi, > > > > I

Merging data from different sources

2018-10-30 Thread Martin Frank Hansen (MHQ)
Hi, I am trying to merge files from different sources and with different content (except for one key-field) , how can this be done in Solr? An example could be: Document 1 001 Unique id for Document 1 test-123 …

RE: Merging data from different sources

2018-10-30 Thread Martin Frank Hansen (MHQ)
. oktober 2018 13:16 To: solr-user Subject: Re: Merging data from different sources Maybe https://lucene.apache.org/solr/guide/7_5/update-request-processors.html#atomicupdateprocessorfactory Regards, Alex On Tue, Oct 30, 2018, 7:57 AM Martin Frank Hansen (MHQ), wrote: > Hi, > > I

RE: Tesseract language

2018-10-27 Thread Martin Frank Hansen (MHQ)
, Oct 26, 2018 at 12:31 PM Martin Frank Hansen (MHQ) wrote: > Hi Tim, > > You were right. > > When I called `tesseract testing/eurotext.png testing/eurotext-dan -l > dan`, I got an error message so I downloaded "dan.traineddata" and > added it to the Tesseract-OCR

RE: Reading data using Tika to Solr

2018-10-25 Thread Martin Frank Hansen (MHQ)
azy and just execute it in > IntelliJ for development and have forgotten to set my classpath on > _numerous_ occasions when running it from a command line ;) > > Best, > Erick > > On Thu, Oct 25, 2018 at 2:55 AM Martin Frank Hansen (MHQ) > wrote: > > > > Hi, >

RE: Tesseract language

2018-10-26 Thread Martin Frank Hansen (MHQ)
able to specify "dan" with your code above. On Fri, Oct 26, 2018 at 10:49 AM Martin Frank Hansen (MHQ) wrote: > > Hi again, > > Now I moved the OCR part to Tika, but I still can't make it work with Danish. > It works when using default language settings and it seems like Tika

RE: Reading data using Tika to Solr

2018-10-26 Thread Martin Frank Hansen (MHQ)
to Solr If you’re processing actual msg (not eml), you’ll also need poi and poi-scratchpad and their dependencies, but then those msgs could have attachments, at which point, you may as just add tika-app. :D On Thu, Oct 25, 2018 at 2:46 PM Martin Frank Hansen (MHQ) wrote: > Hi Erick and

RE: Reading data using Tika to Solr

2018-10-26 Thread Martin Frank Hansen (MHQ)
Hi again, Never mind, I got manage to get the content of the msg-files as well using the following link as inspiration: https://wiki.apache.org/tika/RecursiveMetadata But thanks again for all your help! -Original Message- From: Martin Frank Hansen (MHQ) Sent: 26. oktober 2018 10:14

Reading data using Tika to Solr

2018-10-25 Thread Martin Frank Hansen (MHQ)
Hi, I am trying to read content of msg-files using Tika and index these in Solr, however I am having some problems with the OfficeParser(). I keep getting the error java.lang.NoClassDefFoundError for the OfficeParcer, even though both tika-core and tika-parsers are included in the build path.

indexing multiple levels of data

2018-11-16 Thread Martin Frank Hansen (MHQ)
Hi, I am trying to add meta data and files to Solr, but are experiencing some problems. Data is divided on three two, cases and files. For each case the meta-data is given in an xml document, while meta data for the files is given in another xml document, and the actual files are kept in yet

DIH for TikaEntityProcessor

2018-10-10 Thread Martin Frank Hansen (MHQ)
Hi, I am trying to read documents from a file system into Solr, using dataimporthandler but keep getting the following errors: [cid:image002.png@01D46082.022FF7A0] Exception while processing: files document : null:org.apache.solr.handler.dataimport.DataImportHandlerException:

data-import-handler for solr-7.5.0

2018-10-02 Thread Martin Frank Hansen (MHQ)
Hi, I am having some problems getting the data-import-handler in Solr to work. I have tried a lot of things but I simply get no response from Solr, not even an error. When calling the API: http://localhost:8983/solr/nh/dataimport?command=full-import { "responseHeader":{ "status":0,

SV: data-import-handler for solr-7.5.0

2018-10-02 Thread Martin Frank Hansen (MHQ)
t;:"0:0:0.136"}} Seems like it is not even trying to read the data. Martin Frank Hansen -Oprindelig meddelelse- Fra: Jan Høydahl Sendt: 2. oktober 2018 17:46 Til: solr-user@lucene.apache.org Emne: Re: data-import-handler for solr-7.5.0 > url="C:/Users/z6mhq/Desktop/data_

SV: data-import-handler for solr-7.5.0

2018-10-02 Thread Martin Frank Hansen (MHQ)
import/nh_test.xml" > > Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ? > > -- > Jan Høydahl, search solution architect Cominvent AS - > www.cominvent.com > > > 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) : > > > &g

SV: data-import-handler for solr-7.5.0

2018-10-02 Thread Martin Frank Hansen (MHQ)
/master/configsets/pets-final/pets-data-config.xml). Regards, Alex. On Tue, 2 Oct 2018 at 12:46, Martin Frank Hansen (MHQ) wrote: > > Thanks for the info, the UI looks interesting... It does read the data-config > correctly, so the problem is probably in this file. > > Martin Frank

SV: DIH for different levels of XML

2018-10-07 Thread Martin Frank Hansen (MHQ)
example that ships with DIH example set. Specifically, at commonField parameter, it may be useful for you: https://lucene.apache.org/solr/guide/7_4/uploading-structured-data-store-data-with-the-data-import-handler.html Regards, Alex. On Sun, 7 Oct 2018 at 13:23, Martin Frank Hansen (MHQ) wro

DIH for different levels of XML

2018-10-07 Thread Martin Frank Hansen (MHQ)
Hi, I am having some difficulties adding data from different levels of a xml document. The xml can be as simple as this: 2165432 5 10 The data-config-file looks like this. The result is the following: {

RE: indexing multiple levels of data

2018-11-16 Thread Martin Frank Hansen (MHQ)
, and the burden of building and running a separate app will probably be worth it. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 16. nov. 2018 kl. 12:24 skrev Martin Frank Hansen (MHQ) : > > Hi, > > I am trying to add meta data and files to Solr, but are ex

RE: MLT and facetting

2019-02-25 Thread Martin Frank Hansen (MHQ)
ink there are some pictures which are not being sent through in > the email. > > Do send your query that you are using, and which version of Solr you > are using? > > Regards, > Edwin > >> On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ) wrote: >>

RE: MLT and facetting

2019-02-25 Thread Martin Frank Hansen (MHQ)
Sorry forgot to mention that we are using Solr 7.5. Internal - KMD A/S -Original Message- From: Martin Frank Hansen (MHQ) Sent: 26. februar 2019 07:43 To: solr-user@lucene.apache.org Subject: RE: MLT and facetting Hi Edwin, Thanks for your response. Yes you are right

RE: MLT and facetting

2019-02-25 Thread Martin Frank Hansen (MHQ)
On Mon, 25 Feb 2019 at 20:54, Martin Frank Hansen (MHQ) wrote: > Hi, > > > > I am trying to combine the mlt functionality with facets, but Solr > throws > org.apache.solr.common.SolrException: ":"Unable to compute facet > ranges, facet context is not set".

RE: MLT and facetting

2019-02-26 Thread Martin Frank Hansen (MHQ)
in solrconfig.xml? Regards, Edwin On Tue, 26 Feb 2019 at 14:43, Martin Frank Hansen (MHQ) wrote: > Hi Edwin, > > Thanks for your response. > > Yes you are right. It was simply the search parameters from Solr. > > The query looks like this: > > http:// > .../solr/.../mlt?df

RE: MLT and facetting

2019-03-01 Thread Martin Frank Hansen (MHQ)
. Regards, Edwin On Thu, 28 Feb 2019 at 14:51, Martin Frank Hansen (MHQ) wrote: > Hi Edwin, > > Ok that is nice to know. Do you know when this bug will get fixed? > > By ordering I mean that MLT score the documents according to its > similarity function (believe it is cosine

RE: MLT and facetting

2019-03-01 Thread Martin Frank Hansen (MHQ)
according to the number of > occurrences. But I'm not sure how it will affect the MLT score or how > it will be output when combine together, as it is not working > currently and there is no way to test. > > Regards, > Edwin > >> On Thu, 28 Feb 2019 at 14:51, Martin Fr

RE: MLT and facetting

2019-03-01 Thread Martin Frank Hansen (MHQ)
e output when combine together, as it is not working > currently and there is no way to test. > > Regards, > Edwin > > On Thu, 28 Feb 2019 at 14:51, Martin Frank Hansen (MHQ) wrote: > >> Hi Edwin, >> >> Ok that is nice to know. Do you know when this bug will get fi

RE: MLT and facetting

2019-02-26 Thread Martin Frank Hansen (MHQ)
the same problem in Solr 7.7 if I turn on faceting in /mlt requestHandler. Found this issue in the JIRA: https://issues.apache.org/jira/browse/SOLR-7883 Seems like it is a bug in Solr and it has not been resolved yet. Regards, Edwin On Tue, 26 Feb 2019 at 21:03, Martin Frank Hansen (MHQ) wrote: >

MLT and facetting

2019-02-25 Thread Martin Frank Hansen (MHQ)
Hi, I am trying to combine the mlt functionality with facets, but Solr throws org.apache.solr.common.SolrException: ":"Unable to compute facet ranges, facet context is not set". What I am trying to do is quite simple, find similar documents using mlt and group these using the facet parameter.

RE: MLT and facetting

2019-02-27 Thread Martin Frank Hansen (MHQ)
before, so I'm not sure how it works. For the ordering of the documents, do you mean to sort them according to the criteria that you want? Regards, Edwin On Wed, 27 Feb 2019 at 14:43, Martin Frank Hansen (MHQ) wrote: > Hi Edwin, > > Thanks for your response. Are you sure it

Update handler and atomic update

2019-03-19 Thread Martin Frank Hansen (MHQ)
Hi, Hope someone can help me, I am trying to make an incremental update for one document using the API, but cannot make it work. I have tried a lot of things and all I actually want is to increment the value of the field “clicks” by one. I have something like this:

RE: Update handler and atomic update

2019-03-19 Thread Martin Frank Hansen (MHQ)
uot;docid","clicks":{“inc”:"1"}}] In an /update?commit=true Best regards Thierry See documentation here https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html > On 19 Mar 2019, at 08:14, Martin Frank Hansen (MHQ) wrote: > > Hi, > >

RE: Update handler and atomic update

2019-03-19 Thread Martin Frank Hansen (MHQ)
;,"clicks":{“inc”:"1"}}] in the raw body hence using curl or any other app that allows you this like Postman. Best regards Thierry > On 19 Mar 2019, at 08:59, Martin Frank Hansen (MHQ) wrote: > > Hi Thierry, > > Do you mean something like this? > > http://loc

RE: highlighter, stored documents and performance

2019-03-21 Thread Martin Frank Hansen (MHQ)
without highlighting. > Am 21.03.2019 um 17:05 schrieb Martin Frank Hansen (MHQ) : > > Hi, > > I am wondering how performance highlighting in Solr performs when the number > of documents get large? > > Right now we have about 1 TB of data in all sorts of file types an

highlighter, stored documents and performance

2019-03-21 Thread Martin Frank Hansen (MHQ)
Hi, I am wondering how performance highlighting in Solr performs when the number of documents get large? Right now we have about 1 TB of data in all sorts of file types and I was wondering how storing these documents within Solr (for highlighting purpose) will affect performance? Is it

unable to create new threads: out-of-memory issues

2019-02-12 Thread Martin Frank Hansen (MHQ)
Hi, I am trying to create an index on a small Linux server running Solr-7.5.0, but keep running into problems. When I try to index a file-folder of roughly 18 GB (18000 files) I get the following error from the server: java.lang.OutOfMemoryError: unable to create new native thread. >From the

RE: unable to create new threads: out-of-memory issues

2019-02-12 Thread Martin Frank Hansen (MHQ)
. SolrClient is definitely a subject for heavy reuse. On Tue, Feb 12, 2019 at 5:16 PM Martin Frank Hansen (MHQ) wrote: > Hi Mikhail, > > I am using Solrj but think I might have found the problem. > > I am doing a atomicUpdate on existing documents, and found out that I > creat

RE: unable to create new threads: out-of-memory issues

2019-02-12 Thread Martin Frank Hansen (MHQ)
did you get this error? Usually it occurs in custom code with many new Thread() calls and usually healed with thread poling. On Tue, Feb 12, 2019 at 3:25 PM Martin Frank Hansen (MHQ) wrote: > Hi, > > I am trying to create an index on a small Linux server running > Solr-7.5.0, but

highlighting not working as expected

2019-06-03 Thread Martin Frank Hansen (MHQ)
Hi, I am having some difficulties making highlighting work. For some reason the highlighting feature only works on some fields but not on other fields even though these fields are stored. An example of a request looks like this:

RE: highlighting not working as expected

2019-06-11 Thread Martin Frank Hansen (MHQ)
Please try hl.method=unified and tell us if that helps. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Jun 3, 2019 at 4:06 AM Martin Frank Hansen (MHQ) wrote: > Hi, > > I am having some difficulties making highlighting work. For some

RE: highlighting not working as expected

2019-06-25 Thread Martin Frank Hansen (MHQ)
6.2019 um 10:06 schrieb Martin Frank Hansen (MHQ) : > > Hi, > > I am having some difficulties making highlighting work. For some reason the > highlighting feature only works on some fields but not on other fields even > though these fields are stored. > > An example of a re

RE: highlighting not working as expected

2019-06-17 Thread Martin Frank Hansen (MHQ)
of the documents? > Am 03.06.2019 um 10:06 schrieb Martin Frank Hansen (MHQ) : > > Hi, > > I am having some difficulties making highlighting work. For some reason the > highlighting feature only works on some fields but not on other fields even > though these fields are s

RE: highlighting not working as expected

2019-06-17 Thread Martin Frank Hansen (MHQ)
using for the field “Sagstitel”? Is it the same as other fields? Regards, Edwin On Mon, 3 Jun 2019 at 16:06, Martin Frank Hansen (MHQ) wrote: > Hi, > > I am having some difficulties making highlighting work. For some > reason the highlighting feature only works on some fields but no

RE: highlighting not working as expected

2019-07-01 Thread Martin Frank Hansen (MHQ)
ype definition of those > fields? Could this word be omitted or with wrong encoding during > loading of the documents? > > > Am 03.06.2019 um 10:06 schrieb Martin Frank Hansen (MHQ) : > > > > Hi, > > > > I am having some difficulties making highlighting work. For some > &

solr-injection

2020-02-11 Thread Martin Frank Hansen (MHQ)
Hi, I was wondering how others are handling solr – injection in their solutions? After reading this post: https://www.waratek.com/apache-solr-injection-vulnerability-customer-alert/ I can see how important it is to update to Solr-8.2 or higher. Has anyone been successful in injecting