Not familiar with the contrib you mentioned, or the rationale behind
its removal. But as to your first question, you might be interested
in looking at: https://github.com/lucidworks/hadoop-solr
Disclaimer: I help maintain the "hadoop-solr" project mentioned.
On Thu, Oct 18, 2018 at 8:17 AM
On 10/21/2018 01:06 PM, Shawn Heisey wrote:
> You do it with the request, not with the client
For the UpdateRequests it is the "commitWithinMs"-parameter? To me this
parameter sounds like telling the solr-server I need to see this data within "x
ms". As we have autoCommit and autoSoftCommit
...
Hi Alex,
Thanks again for your reply, much appreciated.
Martin Frank Hansen, Senior Data Analytiker
Data, IM & Analytics
Lautrupparken 40-42, DK-2750 Ballerup
E-mail m...@kmd.dk Web www.kmd.dk
Mobil +4525571418
-Oprindelig meddelelse-
Fra: Alexandre Rafalovitch
Sendt: 21. oktober
Hi Alexandre,
Thank you.
How this explain the issue exists only with SolrCloud and not standalone?
Moshe
From: Alexandre Rafalovitch
Sent: Sunday, October 21, 2018 5:18:24 PM
To: solr-user
Subject: Re: Error while indexing Thai core with SolrCloud
I would
There is a couple of things mixed in here:
1) Extract handler is not recommended for production usage. It is great for
a quick test, just like you did it, but going to production, running it
externally is better. Tika - especially with large files can use up a lot
of memory and trip up the Solr
Hi,
We've specific exception that happening only on Thai core and only once we're
using SolrCloud.
Same indexing activity is running successfully while running on EN core with
SolrCloud or with Thai core and standalone configuration.
We're running on Linux with Solr 4.6
and with
I would check if the Byte-order mark is the cause:
https://en.wikipedia.org/wiki/Byte_order_mark
The error message does not seem to be a perfect match to this issue,
but a good thing to check anyway.
That symbol (right at the file start) is usually invisible and can
trip Java XML parsers for
Hi Alexandre,
Thanks for your reply.
Yes right now it is just for testing the possibilities of Solr and Tesseract.
I will take a look at the Tika documentation to see if I can make it work.
You said that DIH are not recommended for production usage, what is the
recommended method(s) to upload
Thank you. Will check all options and let you know.
From: Alexandre Rafalovitch
Sent: Sunday, October 21, 2018 8:09:34 PM
To: solr-user
Subject: Re: Error while indexing Thai core with SolrCloud
Ok,
That may have been a bit too much :-) However, it was useful.
On 10/21/2018 11:43 AM, Clemens Wyss DEV wrote:
If I omit the core in the url upon creation of the SolrClient, where can I then
"indicate" the core?
You do it with the request, not with the client.
Hi again,
Is there anyone who has some experience of using Tesseract’s OCR module within
Solr? The files I am trying to read into Solr is Danish Tiff documents.
Martin Frank Hansen, Senior Data Analytiker
Data, IM & Analytics
[cid:image001.png@01D383C9.6C129A60]
Lautrupparken 40-42, DK-2750
Ok,
If the same file and the same core definition works on a standalone,
then the issue may be different. Can you please share the full stack
trace of the message. It may be important to see which thread died.
Also, I would just spin up a test Solr 7.5 instance and see if the
problem is still
Ok,
That may have been a bit too much :-) However, it was useful.
There seem to have several possible avenues:
1) You are using SolrJ and your SolrJ version is not the same as the
version of the Solr server. There was a bunch of things that could
trigger, especially in combination with Unicode
Thx Shawn!
> If they're sleeping, then it's unlikely that there's any real contribution to
> system load.
I know, but
> seeing threads you didn't expect to see?
exactly this
> You should really be keeping one SolrClient per server node,
>and indicating which core to access with each request
Usually, we just say to do a custom solution using SolrJ client to
connect. This gives you maximum flexibility and allows to integrate
Tika either inside your code or as a server. Latest Tika actually has
some off-thread handling I believe, to make it safer to embed.
For DIH alternatives, if you
On 10/21/2018 10:13 AM, Clemens Wyss DEV wrote:
Just upgrading from 6.6 to 7.5 and am now seeing many "Connection
evcitor"-threads which are all Thread.slee()ing ...
What's the stacktrace on those threads? If they're sleeping, then it's
unlikely that there's any real contribution to system
Just upgrading from 6.6 to 7.5 and am now seeing many "Connection
evcitor"-threads which are all Thread.slee()ing ...
As of 6.6 I am keeping the SolrClients (one per core) in a HashMap. Is this ok
or should I create a new SolrClient for each request I am doing?
SolrClient creation is as
Hi,
Thank you.
Full stacktrace below
"core_node_name":"172.19.218.201:8082_solr_core_th"}DEBUG - 2018-10-19
02:13:20.343; org.apache.zookeeper.ClientCnxn$SendThread; Reading reply
sessionid:0x200b5a04a770005, packet:: clientPath:null serverPath:null
finished:false header:: 356,1
Here's a skeletal program that uses Tika in a stand-alone client. Rip
the RDBMS parts out
https://lucidworks.com/2012/02/14/indexing-with-solrj/
On Sun, Oct 21, 2018 at 1:13 PM Alexandre Rafalovitch
wrote:
>
> Usually, we just say to do a custom solution using SolrJ client to
> connect. This
Hi Martin,
I wrote a framework (https://github.com/nsoft/jesterj) that is meant to
help with small to medium custom solutions It's not (yet) ready for cases
where you need multiple machines feeding data, but so long as a single box
can do the work it should be useful. It has a basic Tika stage
20 matches
Mail list logo