RE: PDF extraction using Tika

2020-08-24 Thread Srinivas Kashyap
Hi Alexandre, Yes, these are the same PDF files running in windows and linux. There are around 30 pdf files and I tried indexing single file, but faced same error. Is it related to how PDF stored in linux? And with regard to DIH and TIKA going away, can you share if any program which extracts

Re: Solr 8.6.1: Can't round-trip nested document from SolrJ

2020-08-24 Thread Alexandre Rafalovitch
I guess this gets into the point of whether "children" or whatever field is used for child documents actually needs to be in the schema. Schemaless mode creates one, but that's not a defining factor. Because if it needs to be in the schema, then the code should reflect its cardinality. But if it

Re: Solr with HDFS configuration example running in production/dev

2020-08-24 Thread Joe Obernberger
Are you running with solr.lock.type=hdfs ? Have you defined your DirectoryFactory - something like:     true     true     43     name="solr.hdfs.blockcache.direct.memory.allocation">true     16384     true     true     128     1024    

Re: How to Write Autoscaling Policy changes to Zookeeper/SolrCloud using the autoscaling Java API

2020-08-24 Thread Howard Gonzalez
Good morning! To add more context on the question, I can successfully use the Java API to build the list of new Clauses. However, the problem that I have is that I don't know how to "write" those changes back to solr using the Java API. I see there's a writeMap method in the Policy class

Re: Apache Solr 8.6.0 with SSL

2020-08-24 Thread Jan Høydahl
I think you’re experiencing this: https://issues.apache.org/jira/browse/SOLR-14711 No idea why the bin/solr script won’t work with SSL... Jan > 24. aug. 2020 kl. 15:52 skrev Patrik Peng : > > Greetings > > I'm in the process of setting up a SolrCloud cluster with 3 Zookeeper > and 3 Solr

Re: PDF extraction using Tika

2020-08-24 Thread Alexandre Rafalovitch
The issue seems to be more with a specific file and at the level way below Solr's or possibly even Tika's: Caused by: java.io.IOException: expected='>' actual=' ' at offset 2383 at org.apache.pdfbox.pdfparser.BaseParser.readExpectedChar(BaseParser.java:1045) Are you indexing the

PDF extraction using Tika

2020-08-24 Thread Srinivas Kashyap
Hello, We are using TikaEntityProcessor to extract the content out of PDF and make the content searchable. When jetty is run on windows based machine, we are able to successfully load documents using full import DIH(tika entity). Here PDF's is maintained in windows file system. But when

Apache Solr 8.6.0 with SSL

2020-08-24 Thread Patrik Peng
Greetings I'm in the process of setting up a SolrCloud cluster with 3 Zookeeper and 3 Solr nodes on FreeBSD and wish to enable SSL between the Solr nodes. Before enabling SSL, everything worked as expected and I followed the instructions described in the Solr 8.6 docs

Re: How to perform keyword (exact_title) match in solr with sow=true

2020-08-24 Thread raj.yadav
Hi Community members, I tried the following approaches but non of them worked for my use case. 1. For achieving exact match in solr we have to kept sow='false' (solr will use field centric matching mode) and grouped multiple similar fields into one copy field. It does solve the problem of

Re: Simple query

2020-08-24 Thread Jayadevan Maymala
Thanks. I just copied the config file under solr/solr-8.6.0/server/solr/configsets/_default and made minor changes. Tried the console - I think SKMF is doing it. Regards, Jayadevan On Mon, Aug 24, 2020 at 5:45 PM Dominique Bejean wrote: > Hi, > > We need to know how is analyzed your catch_all

Re: ZooKeeper 3.4 end of life

2020-08-24 Thread Erick Erickson
I don’t think you’ll find an official EOL announcement. Here’s a guide, but do note the phrase “on demand” for minor releases. You should interpret “on demand” as when the developers feel the issues in the current point-release code base are numerous enough or critical enough to warrant the

Re: ZooKeeper 3.4 end of life

2020-08-24 Thread h00452626
Hey man, I'm wondering where the announcement is, I'm searching the EOL rule of ZK but found nothing. can u send me the link of the announcement?I will be very thankful. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: All cores gone along with all solr configuration upon reboot

2020-08-24 Thread Erick Erickson
this is consistent with the data disappearing from Zookeeper due to misconfiguration and/or some external process removing it when you reboot. So here’s what I’d do next: Go ahead and reboot. You do _not_ need to start Solr to run bin/solr scripts, and among them are bin/solr zk ls -r / -z

Re: Simple query

2020-08-24 Thread Dominique Bejean
Hi, We need to know how is analyzed your catch_all field at index and search time. I think you are using a stemming filter and "apache" is stemmed as "apach". So "apache" and "apach" match the document and not "apac". You can use the console in order to see how terms are removed or transformed

Re: Solr doesn't run after editing solr.in.sh

2020-08-24 Thread Vincenzo D'Amore
Pay attention to this line SOLR_ULIMIT_CHECKS=falseGC_TUNE=" \ you lost a new line after false. SOLR_ULIMIT_CHECKS=false GC_TUNE=" \ Ciao, Vincenzo -- mobile: 3498513251 skype: free.dev > On 24 Aug 2020, at 01:41, Walter Underwood wrote: > > Also, what platform is this on and what

Simple query

2020-08-24 Thread Jayadevan Maymala
Hi all, I am learning the basics of Solr querying and am not able to figure out something. The first query which searches for 'apac' fetches no documents. The second one which searches for 'apach' , i.e. add h - one more character, fetches a document. curl -X GET "

Re: SOLR Compatibility with Oracle Enterprise Linux 7

2020-08-24 Thread Shawn Heisey
On 8/24/2020 12:46 AM, Wang, Ke wrote: We are using Apache SOLR version 8.4.4.0. The project is planning to upgrade the Linux server from Oracle Enterprise Linux (Red Hat Enterprise Linux) 6 to OEL 7. As I was searching on the Confluence page and was not able to find the information, can I

Re: SOLR Compatibility with Oracle Enterprise Linux 7

2020-08-24 Thread Jörn Franke
Yes, it should be no issues to upgrade to RHEL7. I assume you mean Solr 8.4.0. You can also use the latest Solr version. Why not RHEL8? > Am 24.08.2020 um 09:02 schrieb Wang, Ke : >

SOLR Compatibility with Oracle Enterprise Linux 7

2020-08-24 Thread Wang, Ke
Hi there, We are using Apache SOLR version 8.4.4.0. The project is planning to upgrade the Linux server from Oracle Enterprise Linux (Red Hat Enterprise Linux) 6 to OEL 7. As I was searching on the Confluence page and was not able to find the information, can I please confirm if: * Apache

Re: Solr 8.6.1: Can't round-trip nested document from SolrJ

2020-08-24 Thread Munendra S N
> > Interestingly, I was forced to add children as an array even when the > child was alone and the field was already marked multivalued. It seems > the code does not do conversation to multi-value type, which means the > query code has to be a lot more careful about checking field return > type