Re: Solr Basic Configuration - Highlight - Begginer

2015-12-17 Thread Teague James
Erik's comments not withstanding, there are some gaps in my understanding of your precise situation. Here's a few things that weren't necessarily obvious to me when I took my first try with Solr. Highlighting is the end result of a good hit. It is essentially formatting applied to your hit. It is

add text analyzer to solr

2015-12-17 Thread sara hajili
hi. i wanna to change solr analyzer , like normalization. because solr default normalization for persian language does't satisfy me. so i start reading solr plugins . and i try to implement my PersianNormalization. now i have 2 class in this way : class persianNormalizer extends TokenFilter. and

Re: add text analyzer to solr

2015-12-17 Thread Binoy Dalal
Why don't you post the entire stack trace from the logs. That might give us a better idea to help you. On Thu, 17 Dec 2015, 13:59 sara hajili wrote: > hi. > i wanna to change solr analyzer , like normalization. > because solr default normalization for persian language

Re: pf2 pf3 and stopwords

2015-12-17 Thread Binoy Dalal
For this case of inversion in particular a slop of 1 won't cause any issues since such a reverse match will require the slop to be 2 On Thu, 17 Dec 2015, 14:20 elisabeth benoit wrote: > Inversion (paris charonne or charonne paris) cannot be scored the same. > >

Re: Issues when indexing PDF files

2015-12-17 Thread Zheng Lin Edwin Yeo
Hi Alexandre, Thanks for your reply. So the only way to solve this issue is to explore with PDF specific tools and change the encoding of the file? Is there any way to configure it in Solr? Regards, Edwin On 17 December 2015 at 15:42, Alexandre Rafalovitch wrote: > They

Re: pf2 pf3 and stopwords

2015-12-17 Thread elisabeth benoit
Inversion (paris charonne or charonne paris) cannot be scored the same. 2015-12-16 11:08 GMT+01:00 Binoy Dalal : > What is your exact use case? > > On Wed, 16 Dec 2015, 13:40 elisabeth benoit > wrote: > > > Thanks for your answer. > > > >

Partial update through DIH

2015-12-17 Thread Midas A
Hi, can be do partial update trough Data import handler . Regards, Abhishek

RE: Expected mime type application/octet-stream but got text/html

2015-12-17 Thread Markus Jelsma
Hi - looks like Solr did not start up correctly, got some errors and kept Jetty running. You should find information in that node's logs. M. -Original message- > From:Andrej van der Zee > Sent: Thursday 17th December 2015 10:32 > To:

Re: warning while indexing

2015-12-17 Thread Mikhail Khludnev
On Thu, Dec 17, 2015 at 8:00 AM, Midas A wrote: > > org.apache.solr.update.CommitTracker._scheduleCommitWithinIfNeeded(CommitTracker.java:118) > I seems like you specifies commitWithin that's legal but seems unusual and doubtful with DIH. > > rejected from

Re: Partial update through DIH

2015-12-17 Thread Mikhail Khludnev
hmm it's interesting, in according to the code you can create a transformer which is doing what described at http://yonik.com/solr/atomic-updates/ in *Atomic Updates with SolrJ* It should/might work, but I've never tried. On Thu, Dec 17, 2015 at 12:26 PM, Midas A wrote: >

Re: Issues when indexing PDF files

2015-12-17 Thread Binoy Dalal
You can always write an update handler plugin to convert your PDFs to utf-8 and then push them to solr On Thu, 17 Dec 2015, 14:16 Zheng Lin Edwin Yeo wrote: > Hi Alexandre, > > Thanks for your reply. > > So the only way to solve this issue is to explore with PDF specific

Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-17 Thread Mikhail Khludnev
This fix definitely help for facet.field over docvalues field on mult-segment index since 5.4. I suppose it's irrelevant to JSON Facets, non-dv field, and pre 5.4. I can not comment about comparing perfomance of dv and non-dv fields, because "it depends" (с) benchmarking and profiler are the only

Expected mime type application/octet-stream but got text/html

2015-12-17 Thread Andrej van der Zee
Hi, I am having troubles getting data from a particular shard, even though I follow the documentation: https://cwiki.apache.org/confluence/display/solr/Distributed+Requests This is OK: curl " http://54.93.121.54:8986/solr/connects/select?q=*%3A*=json=true; { // returns correct result set }

A field _indexed_at_tdt added when I index documents.

2015-12-17 Thread Guillermo Ortiz
I'm indexing documents in solr with Spark and it's missing the a field _indexed_at_tdt who is doesn't exist in my documents. I have added this field in my schema, why is this field being added? any solution?

Re: Expected mime type application/octet-stream but got text/html

2015-12-17 Thread Andrej van der Zee
It turns out that the documentation is not correct. If I specify the collection name after shards=, it does work as expected. So this works: curl " http://54.93.121.54:8986/solr/connects/select?q=*%3A*=json=true=1000=54.93.121.54:8986/solr/connects " This does not work: curl "

RE: Issues when indexing PDF files

2015-12-17 Thread Allison, Timothy B.
Generally, I'd recommend opening an issue on PDFBox's Jira with the file that you shared. Tika uses PDFBox...if a fix can be made there, it will propagate back through Tika to Solr. That said, PDFBox 2.0-RC2 extracts no text and warns: WARNING: No Unicode mapping for CID+71 (71) in font

Re: Solr 6 Distributed Join

2015-12-17 Thread Akiel Ahmed
Hi again, I got the join to work. A team mate pointed out that one of the search functions in the innerJoin query was missing a field in the join - adding the e1 field to the fl parameter of the second search function gave the result I expected:

Problem with Solr indexing "non-searchable" pdf files

2015-12-17 Thread RICARDO EITO BRUN
Hi, I am using SOLR as part of the dspace 5.4 SW application. I have a problem when running the dspace indexing command (index-discovery). Most of the files are not being added to the index, and an exception is raised. It seems that Solr does not process the PDF files that are result of scanning

Re: propagate Query.rewrite call to super.rewrite after 5.4 upgrade

2015-12-17 Thread Adrien Grand
Hi Markus, This is indeed related to LUCENE-6590: query boosts are now applied with BoostQuery and if Query.setBoost is called on a query, its rewrite implementation needs to rewrite to a BoostQuery. You can do that by prepending the following to your rewrite(IndexReader) implementation: if

Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-17 Thread Yonik Seeley
On Wed, Dec 16, 2015 at 4:57 AM, Vincenzo D'Amore wrote: > Hi all, > > given that solr 5.4 is finally released, is this what's more stable and > efficient version of solrcloud ? > > I have a website which receives many search requests. It serve normally > about 2000 concurrent

SolR 5.3.1 deletes index files

2015-12-17 Thread Moll, Dr. Andreas
Hi, we are using SolR for some years now and are currently switching from SolR 3.6 to 5.3.1. SolR 5.3.1 deletes all index files when it shuts down and there were external changes on the index-files (in our case from a second SolR-server which produces the index). Is this behaviour intentional?

Re: Solr 6 Distributed Join

2015-12-17 Thread Joel Bernstein
The innerJoin joins two streams sorted by the same join keys (merge join). If third stream has the same join keys you can nest innerJoins. But all three tables need to be sorted by the same join keys to nest innerJoins (merge joins). innerJoin(innerJoin(...), search(...),

Re: SolR 5.3.1 deletes index files

2015-12-17 Thread Shawn Heisey
On 12/17/2015 8:00 AM, Moll, Dr. Andreas wrote: > we are using SolR for some years now and are currently switching from SolR > 3.6 to 5.3.1. > SolR 5.3.1 deletes all index files when it shuts down and there were external > changes on the index-files > (in our case from a second SolR-server which

Re: Issues when indexing PDF files

2015-12-17 Thread Walter Underwood
PDF isn’t really text. For example, it doesn’t have spaces, it just moves the next letter over farther. Letters might not be in reading order — two column text could be printed as horizontal scans. Custom fonts might not use an encoding that matches Unicode, which makes them encrypted (badly).

Re: Solr 6 Distributed Join

2015-12-17 Thread Joel Bernstein
Below is an example of nested joins where the innerJoin is done in parallel using the parallel function. The partitionKeys parameter needs to be added to the searches when the parallel function is used to partition the results across worker nodes. hashJoin(

Re: Solr 6 Distributed Join

2015-12-17 Thread Joel Bernstein
One thing to note about the hashJoin is that it requires the search results from the hashed query to fit entirely in memory. The innerJoin does not have this requirement as it performs a streaming merge join. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Dec 17, 2015 at 10:33 AM, Joel

Re: API accessible without authentication even though Basic Auth Plugin is enabled

2015-12-17 Thread tine-2
Noble Paul നോബിള്‍ नोब्ळ् wrote > It works as designed. > > Protect the read path [...] Works like described in 5.4.0, didn't work in 5.3.1, s. https://issues.apache.org/jira/browse/SOLR-8408 -- View this message in context:

Re: Slow query response.

2015-12-17 Thread Jack Krupansky
A single query with tens of thousands of terms is very clearly a misuse of Solr. If it happens to work at all, consider yourself lucky. Are you using a standard Solr query parser or the terms query parser that lets you write a raw list of terms to OR. Are your nodes CPU-bound or I/O-bound during

RE: Use multiple istance simultaneously

2015-12-17 Thread Gian Maria Ricci - aka Alkampfer
Hi, I've a quick question on zookeeper, how can I run zookeeper as service in linux so it autostart if the instance is rebooted? The only information I've found in the internet is on this link http://positivealex.github.io/blog/posts/how-to-install-zookeeper-as-service-on-centos and it seems

Re: Problem with Solr indexing "non-searchable" pdf files

2015-12-17 Thread Erick Erickson
Not sure how much help I can be, I have no clue what DSpace is doing with Solr. If you're willing to try to index straight to Solr, you can always use SolrJ to parse the files, it's actually not very hard. Here's an example: https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ some

Re: Strange debug output for a slow query

2015-12-17 Thread Shawn Heisey
On 12/16/2015 9:08 PM, Erick Erickson wrote: > Hmmm, take a look at the individual queries on a shard, i.e. peek at > the Solr logs and see if the fq clause comes through cleanly when you > see =false. I suspect this is just a glitch in assembling the > debug response. If it is, it probably

Re: Strange debug output for a slow query

2015-12-17 Thread Erick Erickson
Yeah, if your warmup times are that long, then either you're having lots of disk I/O contention or something. That said, you've mentioned that after a while the queries are fine. That indicates to me that you aren't autowarming _enough_ and that your slow queries are not pre-loading parts of your

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-17 Thread Erick Erickson
I just tried it (admittedly using just a simple input obviously not a PDF file) and it works perfectly as I'd expect. So a couple of things: 1> what happens if you highlight the content field? The text field should be fine. 2> Did you completely blow away your index whenever you changed the

Re: Expected mime type application/octet-stream but got text/html

2015-12-17 Thread Erick Erickson
Andrej: Indeed, it's a doc problem. A long time ago in a Solr far away, there was a bunch of effort to use the "default" collection (collection1). When that was changed, this documentation didn't get updated. We'll update it in a few, thanks for reporting! Erick On Thu, Dec 17, 2015 at 1:39

Re: SolrCloud 4.8.1 - commit wait

2015-12-17 Thread Erick Erickson
Glad to hear it's solved! The suggester stuff is way cool, but can surprise you! Erick On Thu, Dec 17, 2015 at 2:54 AM, Vincenzo D'Amore wrote: > Great!!! Great Erick! It was a buildOnCommit. > > Many thanks for your help. > > > > On Wed, Dec 16, 2015 at 6:30 PM, Erick

Re: Expected mime type application/octet-stream but got text/html

2015-12-17 Thread Chris Hostetter
: : Indeed, it's a doc problem. A long time ago in a Solr far away, there : was a bunch of effort to use the "default" collection (collection1). : When that was changed, this documentation didn't get updated. : : We'll update it in a few, thanks for reporting! Fixed on erick's behalf because he

Re: solr cloud invalid shard/collection configuration

2015-12-17 Thread Shawn Heisey
On 12/14/2015 10:47 PM, ig01 wrote: > We installed solr with solr.cmd -e cloud utility that comes with the > installation. > The names of shards are odd because in this case after the installation > We've migrated an old index from our other environment (wich is solr single > node) and splitted it

Re: Trying to index document in Solr with solr-spark library

2015-12-17 Thread Erick Erickson
Looks like your Spark job is not connecting to the same Zookeeper as your Solr nodes. Or, I suppose, the Solr nodes aren't started. You might get more information on the Cloudera help boards Best, Erick On Wed, Dec 16, 2015 at 11:58 PM, Guillermo Ortiz wrote: > I'm

Re: A field _indexed_at_tdt added when I index documents.

2015-12-17 Thread Pushkar Raste
You must have this field in your schema with some default value assigned to it (most probably default value is NOW). This field is usually used to determine latest timestamp when this document was last indexed. On 17 December 2015 at 04:51, Guillermo Ortiz wrote: > I'm

propagate Query.rewrite call to super.rewrite after 5.4 upgrade

2015-12-17 Thread Markus Jelsma
Hi, Apologies for the cross post. We have a class overridding SpanPositionRangeQuery. It is similar to a SpanFirst query but it is capable of adjusting the boost value with regard to distance. With the 5.4 upgrade the unit tests suddenly threw the following exception: Query class

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-17 Thread Evert R.
Hello Erick, Sorry for my mistakes. Here is everything I got so far: 1. It bring the result perfectly but the hightlight (empty) field as below: { "responseHeader":{ "status":0, "QTime":15, "params":{ "q":"text:nietava", "debug":"query", "hl":"true",

Re: SolrCloud 4.8.1 - commit wait

2015-12-17 Thread Vincenzo D'Amore
Great!!! Great Erick! It was a buildOnCommit. Many thanks for your help. On Wed, Dec 16, 2015 at 6:30 PM, Erick Erickson wrote: > Quick scan, but probably this: > INFO > o.a.solr.spelling.suggest.Suggester - build() > > The suggester build process can easily take

Re: Issues when indexing PDF files

2015-12-17 Thread Charlie Hull
On 17/12/2015 08:45, Zheng Lin Edwin Yeo wrote: Hi Alexandre, Thanks for your reply. So the only way to solve this issue is to explore with PDF specific tools and change the encoding of the file? Is there any way to configure it in Solr? Solr uses Tika to extract plain text from PDFs. If the

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-17 Thread Evert R.
Hello Teague, Thanks for your reply and tip! I think Solr will give me a better result than just using Tika to read up my files and send to a Fulltext Index in my MySQL, which has the precise point of not highlighting the text snippets... So, I will keep on trying to fix Solr to my needs, and

Re: Security Problems

2015-12-17 Thread Jan Høydahl
Anyone cannot just go "INSERT foo INTO bar” on a random MySql server in the data room, so why should Solr be less secure once Auth is enabled? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 16. des. 2015 kl. 17.02 skrev Noble Paul : > > I

Slow query response.

2015-12-17 Thread Modassar Ather
Hi, I have a field f which is defined as follows. Solr-5.2.1 is used. The index is spread across 12 shards (no replica) and the index size on each node is around 100 GB. When I search for 50 thousand values (ORed) in the field f it takes almost around 45 to 55 seconds. Per my understanding it