Re: query to get parents without childs

2015-12-16 Thread Scott Stults
Hi Novin,

How are you associating parents with children? Is it a "children"
multivalued field in the parent record? If so you could query for records
that don't have a value in that field like "-children:[* TO *]"

k/r,
Scott

On Wed, Dec 16, 2015 at 7:29 AM, Novin Novin  wrote:

> Hi guys,
>
> I have few parent index without child, what would wold be the query for
> those to get?
>
> Thanks,
> Novin
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Re: Security Problems

2015-12-16 Thread Noble Paul
I have opened https://issues.apache.org/jira/browse/SOLR-8429

On Wed, Dec 16, 2015 at 9:32 PM, Noble Paul  wrote:
> I don't this behavior is intuitive. It is very easy to misunderstand
>
> I would rather just add a flag to "authentication" plugin section
> which says "blockUnauthenticated" : true
>
> which means all unauthenticated requests must be blocked.
>
>
>
>
> On Tue, Dec 15, 2015 at 7:09 PM, Jan Høydahl  wrote:
>> Yes, that’s why I believe it should be:
>> 1) if only authentication is enabled, all users must authenticate and all 
>> authenticated users can do anything.
>> 2) if authz is enabled, then all users must still authenticate, and can by 
>> default do nothing at all, unless assigned proper roles
>> 3) if a user is assigned the default “read” rule, and a collection adds a 
>> custom “/myselect” handler, that one is unavailable until the user gets it 
>> assigned
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>>> 14. des. 2015 kl. 14.15 skrev Noble Paul :
>>>
>>> ". If all paths were closed by default, forgetting to configure a path
>>> would not result in a security breach like today."
>>>
>>> But it will still mean that unauthorized users are able to access,
>>> like guest being able to post to "/update". Just authenticating is not
>>> enough without proper authorization
>>>
>>> On Mon, Dec 14, 2015 at 3:59 PM, Jan Høydahl  wrote:
> 1) "read" should cover all the paths

 This is very fragile. If all paths were closed by default, forgetting to 
 configure a path would not result in a security breach like today.

 /Jan
>>>
>>>
>>>
>>> --
>>> -
>>> Noble Paul
>>
>
>
>
> --
> -
> Noble Paul



-- 
-
Noble Paul


RE: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Teague James
Sorry to hear that didn't work! Let me ask a couple of questions...

Have you tried the analyzer inside of the Admin Interface? It has helped me 
sort out a number of highlighting issues in the past. To access it, go to your 
Admin interface, select your core, then select Analysis from the list of 
options on the left. In the analyzer, enter the term you are indexing in the 
top left (in other words the term in the document you are indexing that you 
expect to get a hit on) and right input fields. Select the field that it is 
destined for (in your case that would be 'content'), then hit analyze. Helps if 
you have a big screen!

This will show you the impact of the various filter factories that you have 
engaged and their effect on whether or not a 'hit' is being generated. Hits are 
idietified by a very feint highlight. (PSST... Developers... It would be really 
cool if the highlight color were more visible or customizable... Thanks y'all) 
If it looks like you're getting hits, but not getting highlighting, then open 
up a new tab with the Admin's query interface. Same place on the left as the 
analyzer. Replace the "*:*" with your search term (assuming you already indexed 
your document) and if necessary you can put something in the FQ like 
"id:123456" to target a specific record.

Did you get a hit? If no, then it's not highlighting that's the issue. If yes, 
then try dumping this in your address bar (using your URL/IP, search term, and 
core name of course. The fq= is an example) :
http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456="[SEARCH-TERM];

That will dump Solr's output to your browser where you can see exactly what is 
getting hit.

Hope that helps! Let me know how it goes. Good luck.

-Teague

-Original Message-
From: Evert R. [mailto:evert.ra...@gmail.com] 
Sent: Wednesday, December 16, 2015 1:46 PM
To: solr-user 
Subject: Re: Solr Basic Configuration - Highlight - Begginer

Hi Teague!

I configured the solrconf.xml and schema.xml exactly the way you did, only 
substituting the word 'documentText' per 'content' used by the techproducts 
sample, I reindex through :

 curl '
http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1=true'
-F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"

with the same result no highlight in the respond as below:

"highlighting": { "pdf1": {} }

=(

Really... do not know what to do...

Thanks for your time, if you have any more suggestion where I could be missing 
something... please let me know.


Best regards,

*Evert*

2015-12-16 15:30 GMT-02:00 Teague James :

> Hi Evert,
>
> I recently needed help with phrase highlighting and was pointed to the 
> FastVectorHighlighter which worked out great. I just made a change to 
> the configuration to add generateWordParts="0" and 
> generateNumberParts="0" so that searches for things like "1a" would 
> get highlighted correctly. You may or may not need that feature. You 
> can always remove them or change the value to "1" to switch them on 
> explicitly. Anyway, hope this helps!
>
> solrconfig.xml (partial snip)
> 
> 
> xml
> explicit
> 10
> documentText
> on
> text
> true
> 100
> 
> 
> 
> 
>
> schema.xml (partial snip)
> required="true" multiValued="false" />
> multivalued="true" termVectors="true" termOffsets="true"
> termPositions="true" />
>
>  positionIncrementGap="100">
> 
> 
>  words="stopwords.txt" />
>  catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> generateWordParts="0" />
>  synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
> 
> 
> 
>  catenateAll="1" preserveOriginal="1" generateWordParts="0" />
>  words="stopwords.txt" />
> 
> 
> 
> 
>
> -Teague
>
> From: Evert R. [mailto:evert.ra...@gmail.com]
> Sent: Tuesday, December 15, 2015 6:25 AM
> To: solr-user@lucene.apache.org
> Subject: Solr Basic Configuration - Highlight - Begginer
>
> Hi there!
>
> It´s my first installation, not sure if here is the right channel...
>
> Here is my steps:
>
> 1. Set up a basic install of solr 5.4.0
>
> 2. Create a new core through command line (bin/solr create -c test)
>
> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>
> 4. Query over the browser and it brings the correct search, but it 
> does not show the part of the text I am querying, the highlight.
>
>   I have already flagled the 'hl' option. But still it does not word...
>
> Exemple: I am looking for the word 'peace' in my pdf file 

Re: Append fields to a document

2015-12-16 Thread Jack Krupansky
What is the nature of your documents that reproducing them is so expensive?
Whatever it is, you should spend some time trying to reduce it to something
more manageable and performant. Generally, the primary recommendation is to
simply reindex any documents that need to be updated since atomic update
has various caveats so that it is only useful in a subset of use cases.

-- Jack Krupansky

On Wed, Dec 16, 2015 at 10:09 AM, Jamie Johnson  wrote:

> I have a use case where we only need to append some fields to a document.
> To retrieve the full representation is very expensive but I can easily get
> the deltas.  Is it possible to just add fields to an existing Solr
> document?  I experimented with using overwrite=false, but that resulted in
> two documents with the same uniqueKey in the index (which makes sense).  Is
> there a way to accomplish what I'm looking to do in Solr?  My fields aren't
> all stored and think it will be too expensive for me to make that change.
> Any thoughts would be really appreciated.
>


Re: query to get parents without childs

2015-12-16 Thread Novin Novin
Hi Scott,

Actually, it is not multi value field. it is nested document.

Novin

On 16 December 2015 at 20:33, Scott Stults <
sstu...@opensourceconnections.com> wrote:

> Hi Novin,
>
> How are you associating parents with children? Is it a "children"
> multivalued field in the parent record? If so you could query for records
> that don't have a value in that field like "-children:[* TO *]"
>
> k/r,
> Scott
>
> On Wed, Dec 16, 2015 at 7:29 AM, Novin Novin  wrote:
>
> > Hi guys,
> >
> > I have few parent index without child, what would wold be the query for
> > those to get?
> >
> > Thanks,
> > Novin
> >
>
>
>
> --
> Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> | 434.409.2780
> http://www.opensourceconnections.com
>


Re: Solr 6 Distributed Join

2015-12-16 Thread Akiel Ahmed
Hi Dennis,

Thank you for your help. I used your explanation to construct an innerJoin 
query; I think I am getting further but didn't get the results I expected. 
The following describes what I did – is there any chance you can tell 
where I am going wrong:

Solr 6 Developer Builds: #2738 and #2743

1. Modified server/solr/configsets/basic_configs/conf/managed-schema so it 
reads:



  id
  
  
  
  
  
  
  
  
  

  
  
  
  

  


2. Modified server/solr/configsets/basic_configs/conf/solrconfig.xml, 
adding the following near the bottom of the file so it is the last request 
handler

   
 
json 
false 
 
  

3. Used solr -e cloud to setup a solr cloud instance, picking all the 
defaults except I chose basic_configs

4. After solr is running I ingested the following data via the Solr Web UI 
(/update handler, Document Type = CSV)
id,type,e1,e2,text
1,ABC,,,John Smith
2,ABC,,,Jane Smith
3,ABC,,,MiKe Smith
4,ABC,,,John Doe
5,ABC,,,Jane Doe
6,ABC,,,MiKe Doe
7,ABC,,,John Smith
8,DEF,,,Chicken Burger
9,DEF,,,Veggie Burger
10,DEF,,,Beef Burger
11,DEF,,,Chicken Donar
12,DEF,,,Chips
13,DEF,,,Drink
20,GHI,1,2,Friends
21,GHI,3,4,Friends
22,GHI,5,6,Friends
23,GHI,7,6,Friends
24,GHI,6,4,Friends
25,JKL,1,8,Order
26,JKL,2,9,Order
27,JKL,3,10,Order
28,JKL,4,11,Order
29,JKL,5,12,Order
30,JKL,6,13,Order

5. Navigating to the following URL in a browser returned an expected 
result:
http://localhost:8983/solr/gettingstarted/select?q={!join from=id 
to=e1}text:John="id"


...
  

  20
  1
  2
  ...


  28
  4
  11
  ...


  23
  7
  6
  ...

  


6. Navigating to the following URL in a browser does NOT return what I 
expected:
http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted
, fl="id", q=text:John, sort="id 
asc",zkHost="localhost:9983",qt="/export"), search(gettingstarted, 
fl="id", q=text:Friends, sort="id 
asc",zkHost="localhost:9983",qt="/export"), on="id=e1")

{"result-set":{"docs":[
{"EOF":true,"RESPONSE_TIME":124}]}}


I also have a join related question. Is there any chance I can specify a 
query and join for more than 2 things. For example:

innerJoin(search(gettingstarted, fl="id", q=text:John, ...) as s1, 
  search(gettingstarted, fl="id", q=text:Chicken, ...) as s2
  search(gettingstarted, fl="id", q=text:Friends, ...) as s3)
  on="s1.id=s3.e1", 
  on="s2.id=s3.e2")
 
Sorry if the query does not make sense, but given the data above my 
intention is to find a single result made up of 3 documents: 
s1.id=1,s2.id=8,s3.id=25
Is that possible? If yes, will Solr 6 support an arbitrary number of 
queries and associated joins?

Cheers

Akiel



From:   Dennis Gove 
To: Akiel Ahmed/UK/IBM@IBMGB, solr-user@lucene.apache.org
Date:   11/12/2015 15:34
Subject:Re: Solr 6 Distributed Join



Akiel,

Without seeing your full url I assume that you're missing the
stream=innerJoin(.) part of it. A full sample url would look like this
http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers,
fl="personId,companyId,title", q=companyId:*, sort="companyId
asc",zkHost="localhost:2181",qt="/export"),search(companies,
fl="id,companyName", q=*:*, sort="id
asc",zkHost="localhost:2181",qt="/export"),on="companyId=id")

This example will return a join of career records with the company name 
for
all career records with a non-null companyId.

And the pieces have the following meaning:
http://localhost:8983/solr/careers/stream?  - you have a collection called
careers available on localhost:8983 and you're hitting its stream handler
?stream=  - you are passing the stream parameter to the stream handler
zkHost="localhost:2181"  - there is a zk instance running on 
localhost:2181
where solr can get clusterstate information. Note, that since you're
sending the request to the careers collection this param is not required 
in
the search(careers) part but is required in the search(companies)
part. For simplicity I usually just provide it for all.
qt="/export"  - tells solr to use the export handler. this assumes all 
your
fields are in docValues. if you'd rather not use the export handler then
you probably want to provide the rows=# param to tell solr to return a
large # of rows for each underlying search. Without it solr will default
to, I believe, 10 rows.

CCing the user list so others can see this as well.

We're working on additional documentation for Streaming Aggregation and
Expressions. The page can be found at
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions but
it's missing a lot of things we've added recently.

- Dennis

On Fri, Dec 11, 2015 at 9:51 AM, Akiel Ahmed  wrote:

> Hi,
>
> Sorry, this is out of the blue - I have joined the Solr mailing list, 
but
> I don't know if that it is the correct place to ask my question. 

Re: Append fields to a document

2015-12-16 Thread Erick Erickson
The only way to do this currently is with Atomic Updates, which
require all fields to be stored except the destinations of copyField
directives. see:

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

Best,
Erick

On Wed, Dec 16, 2015 at 7:09 AM, Jamie Johnson  wrote:
> I have a use case where we only need to append some fields to a document.
> To retrieve the full representation is very expensive but I can easily get
> the deltas.  Is it possible to just add fields to an existing Solr
> document?  I experimented with using overwrite=false, but that resulted in
> two documents with the same uniqueKey in the index (which makes sense).  Is
> there a way to accomplish what I'm looking to do in Solr?  My fields aren't
> all stored and think it will be too expensive for me to make that change.
> Any thoughts would be really appreciated.


RE: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Teague James
Hi Evert,

I recently needed help with phrase highlighting and was pointed to the 
FastVectorHighlighter which worked out great. I just made a change to the 
configuration to add generateWordParts="0" and generateNumberParts="0" so that 
searches for things like "1a" would get highlighted correctly. You may or may 
not need that feature. You can always remove them or change the value to "1" to 
switch them on explicitly. Anyway, hope this helps!

solrconfig.xml (partial snip)


xml
explicit
10
documentText
on
text
true
100





schema.xml (partial snip)

   




















-Teague

From: Evert R. [mailto:evert.ra...@gmail.com] 
Sent: Tuesday, December 15, 2015 6:25 AM
To: solr-user@lucene.apache.org
Subject: Solr Basic Configuration - Highlight - Begginer

Hi there!

It´s my first installation, not sure if here is the right channel...

Here is my steps:

1. Set up a basic install of solr 5.4.0

2. Create a new core through command line (bin/solr create -c test)

3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)

4. Query over the browser and it brings the correct search, but it does not 
show the part of the text I am querying, the highlight. 

  I have already flagled the 'hl' option. But still it does not word...

Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4 
matches for this word, it shows me the book name (pdf file) but does not bring 
which part of the text it has the word peace on it.


I am problably missing some configuration in schema.xml, which is missing from 
my folder /solr/server/solr/test/conf/

Or even the solrconfig.xml...

I have read a bunch of things about highlight check these files, copied the 
standard schema.xml to my core/conf folder, but still it does not bring the 
highlight.


Attached a copy of my solrconfig.xml file.


I am very sorry for this, probably, dumb and too basic question... First time I 
see solr in live.


Any help will be appreciated.



Best regards,


Evert Ramos

mailto:evert.ra...@gmail.com




Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hi Erick,

I think you are right!

When I use the form 'features:accents' in my case 'content:nietava', it
show as if there was not matching words... but if I take the field off
having only the 'q=searchword' (q=nietava) it brings the pdf content file,
as below (in XML out type):

#partial snip:


Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc Francisco
Cândido Xavier e Waldo Vieira Sexo e Destino 12o livro da Coleção “A Vida
no Mundo Espiritual” Ditado pelo Espírito André Luiz FEDERAÇÃO ESPÍRITA
BRASILEIRA DEPARTAMENTO EDITORIAL Rua Souza Valente, 17 20941-040 - Rio -
RJ - Brasil http://www.febnet.org.br/ Francisco Cândido Xavier - Sexo e
Destino - pelo Espírito André Luiz 2 Coleção “A Vida no Mundo Espiritual”
01 - Nosso Lar 02 - Os Mensageiros 03 - Missionários da Luz 04 - Obreiros
da Vida Eterna 05 - No Mundo Maior 06 - Libertação 07 - Entre a Terra e o
Céu 08 - Nos Domínios da Mediunidade 09 - Ação e Reação 10 - Evolução em
Dois Mundos 11 - Mecanismos da Mediunidade 12 - Sexo e Destino 13 - E a
Vida Continua... Francisco Cândid
​
So, using:

1. q=content:nietava=true=content  -> results:



0
3

content:nietava
true
content






2.q=nietava=true=content  -> results:



0
93

nietava
true
content




pdf1
2011-07-28T20:39:26Z


Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc



application/pdf

Wander
Wander


Microsoft Word - André Luiz - Sexo e Destino _Chico e Waldo_.doc Francisco
Cândido Xavier e Waldo Vieira Sexo e Destino 12o livro da Coleção “A Vida
no Mundo Espiritual” Ditado pelo Espírito André Luiz FEDERAÇÃO ESPÍRITA
BRASILEIRA DEPARTAMENTO EDITORIAL Rua Souza Valente, 17 20941-040 - Rio -
RJ - Brasil http://www.febnet.org.br/ Francisco Cândido Xavier - Sexo e
Destino - pelo Espírito André Luiz 2 Coleção “A Vida no Mundo Espiritual”
01 - Nosso Lar 02 - Os Mensageiros 03 - Missionários da Luz 04 - Obreiros
da Vida Eterna 05 - No Mundo Maior 06 - Libertação 07 - Entre a Terra e o
Céu 08 - Nos Domínios da Mediunidade 09 - Ação e Reação 10 - Evolução em
Dois Mundos 11 - Mecanismos da Mediunidade 12 - Sexo e Destino 13 - E a
Vida Continua... Francisco Cândido Xavier - ...(long text...
including the word 'nietava'
​  

1520731379641352192





​

 =(

Thanks!

​
*Evert*

2015-12-16 15:17 GMT-02:00 Erick Erickson :

> Ok, you're getting confused by all the options, an easy thing to do.
> You're trying to do too many things at once without making sure
> the basics work
>
> 1> Forget all about the f.content.hl stuff. That's there in case
> you want to specify different parameters for different fields in the same
> highlight request. That's an advanced option for later
>
> 2> start with the basic techproducts example. Then this should show
> you hightlights:
> q=features:accents=true=features
>
> That's about as basic as you get. It's searching for "accents" in the
> features field and returning highlights on the features field.
>
> Once that's working, _then_ refine.
>
> Best,
> Erick
>
> On Wed, Dec 16, 2015 at 8:21 AM, Evert R.  wrote:
> > Hi Andrea,
> >
> > ok, let´s do it:
> >
> > 1. it does has the 'nietava' term, so it brings the only book (pdf file)
> > has this word, and all its content as my previous message to Erick, so
> the
> > content field is there.
> >
> > 2. using content:nietava it does not show any result as below:
> >
> > { "responseHeader": { "status": 400, "QTime": 12, "params": { "q":
> > "contents:nietava", "indent": "true", "fl": "id", "wt": "json", "_":
> > "1450282631352" } }, "error": { "msg": "undefined field contents",
> "code":
> > 400 } }
> >
> > 3. Here is what I found when grepping 'content' from the techproducts
> conf
> > folder:
> >
> > schema.xml:  > stored="true" multiValued="true"/> schema.xml:  > type="text_general" indexed="false" stored="true" multiValued="true"/>
> > schema.xml:  schema.xml:
> >  solrconfig.xml:  > name="facet.field">content_type solrconfig.xml:  > name="hl.fl">content features title name solrconfig.xml:  > name="f.content.hl.snippets">3 solrconfig.xml:  > name="f.content.hl.fragsize">200 solrconfig.xml:  > name="f.content.hl.alternateField">content solrconfig.xml:  > name="f.content.hl.maxAlternateFieldLength">750 solrconfig.xml:
>  > name="stream.contentType">application/json solrconfig.xml:  > name="stream.contentType">application/csv solrconfig.xml:  > name="content-type">text/plain; charset=UTF-8
> >
> > and the grep on 'content_type':
> >
> > schema.xml:> stored="true" multiValued="true"/>
> > schema.xml:   
> > solrconfig.xml:   content_type
> >
> > =)
> >
> > Thanks for checking out.
> >
> >
> >
> > *Evert *
> >
> > 2015-12-16 12:59 GMT-02:00 Andrea Gazzarini :
> >
> >> hl=f.content.hl.content (I guess) is definitely wrong. Some questions:
> >>
> >>- First, sorry, the obvious question: are you sure the documents
> contain
> >>the "nietava" term?
> >>- Could you try 

Re: Solr cloud instance does not read cores from Zookeeper whilst connected

2015-12-16 Thread Erick Erickson
At a random guess, how are you starting Zookeeper and Solr?
Is it possible that you're running the Zookeeper embedded in Solr
but have an external Zookeeper running also? In that scenario
you might be seeing one Zookeeper in the admin UI and another
when trying to create the collection.

Could you cut/paste the _exact_ commands you use to start Solr and
create the collection?

Best,
Erick

On Wed, Dec 16, 2015 at 4:24 AM, Andrej van der Zee
 wrote:
> Hi,
>
> I have setup Zookeer and uploaded a collection config. But somehow it seems
> that Solr keeps reading core definitions locally ("Looking for core
> definitions underneath /opt/solr/server/solr") instead of getting it from
> Zookeep. Below the logs.
>
> Probably some kind of config thingy, unfortunately I cannot get from the
> documentation what is missing.
>
> In the Solr GUI I can see the Cloud tree on Zookeeper. But I can see this
> WARN in the GUI:
> 12/16/2015, 1:22:45 PMWARNnullZookeeperInfoServletState for collection
> connects not found in /clusterstate.json or
> /collections/connects/state.json!
>
> Is that related?
>
> Thanks,
> Andrej
>
>
>
> Starting Solr in SolrCloud mode on port 8983 from /opt/solr/server
>
> 0INFO  (main) [   ] o.e.j.u.log Logging initialized @269ms
> 265  INFO  (main) [   ] o.e.j.s.Server jetty-9.2.11.v20150529
> 278  WARN  (main) [   ] o.e.j.s.h.RequestLogHandler !RequestLog
> 279  INFO  (main) [   ] o.e.j.d.p.ScanningAppProvider Deployment monitor
> [file:/opt/solr/server/contexts/] at interval 0
> 586  INFO  (main) [   ] o.e.j.w.StandardDescriptorProcessor NO JSP Support
> for /solr, did not find org.apache.jasper.servlet.JspServlet
> 595  WARN  (main) [   ] o.e.j.s.SecurityHandler
> ServletContext@o.e.j.w.WebAppContext@57fffcd7{/solr,file:/opt/solr/server/solr-webapp/webapp/,STARTING}{/opt/solr/server/solr-webapp/webapp}
> has uncovered http methods for path: /
> 599  INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> SolrDispatchFilter.init(): WebAppClassLoader=1926764753@72d818d1
> 609  INFO  (main) [   ] o.a.s.c.SolrResourceLoader JNDI not configured for
> solr (NoInitialContextEx)
> 609  INFO  (main) [   ] o.a.s.c.SolrResourceLoader using system property
> solr.solr.home: /opt/solr/server/solr
> 610  INFO  (main) [   ] o.a.s.c.SolrResourceLoader new SolrResourceLoader
> for directory: '/opt/solr/server/solr/'
> 719  WARN  (main) [   ] o.a.s.s.SolrDispatchFilter Solr property
> solr.solrxml.location is no longer supported. Will automatically load
> solr.xml from ZooKeeper if it exists
> 725  INFO  (main) [   ] o.a.s.c.c.SolrZkClient Using default
> ZkCredentialsProvider
> 744  INFO  (main) [   ] o.a.s.c.c.ConnectionManager Waiting for client to
> connect to ZooKeeper
> 800  INFO  (zkCallback-1-thread-1) [   ] o.a.s.c.c.ConnectionManager
> Watcher org.apache.solr.common.cloud.ConnectionManager@e250cde
> name:ZooKeeperConnection Watcher:172.31.11.65:2181 got event WatchedEvent
> state:SyncConnected type:None path:null path:null type:None
> 800  INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client is connected to
> ZooKeeper
> 800  INFO  (main) [   ] o.a.s.c.c.SolrZkClient Using default ZkACLProvider
> 805  INFO  (main) [   ] o.a.s.s.SolrDispatchFilter Loading solr.xml from
> SolrHome (not found in ZooKeeper)
> 807  INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading container
> configuration from /opt/solr/server/solr/solr.xml
> 852  INFO  (main) [   ] o.a.s.c.CoresLocator Config-defined core root
> directory: /opt/solr/server/solr
> 867  INFO  (main) [   ] o.a.s.c.CoreContainer New CoreContainer 510063093
> 867  INFO  (main) [   ] o.a.s.c.CoreContainer Loading cores into
> CoreContainer [instanceDir=/opt/solr/server/solr/]
> 867  INFO  (main) [   ] o.a.s.c.CoreContainer loading shared library:
> /opt/solr/server/solr/lib
> 868  WARN  (main) [   ] o.a.s.c.SolrResourceLoader No files added to
> classloader from lib: lib (resolved as: /opt/solr/server/solr/lib).
> 879  INFO  (main) [   ] o.a.s.h.c.HttpShardHandlerFactory created with
> socketTimeout : 60,connTimeout : 6,maxConnectionsPerHost :
> 20,maxConnections : 1,corePoolSize : 0,maximumPoolSize :
> 2147483647,maxThreadIdleTime : 5,sizeOfQueue : -1,fairnessPolicy :
> false,useRetries : false,
> 1009 INFO  (main) [   ] o.a.s.u.UpdateShardHandler Creating
> UpdateShardHandler HTTP client with params:
> socketTimeout=60=6=true
> 1010 INFO  (main) [   ] o.a.s.l.LogWatcher SLF4J impl is
> org.slf4j.impl.Log4jLoggerFactory
> 1011 INFO  (main) [   ] o.a.s.l.LogWatcher Registering Log Listener [Log4j
> (org.slf4j.impl.Log4jLoggerFactory)]
> 1012 INFO  (main) [   ] o.a.s.c.ZkContainer Zookeeper client=
> 172.31.11.65:2181
> 1027 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Waiting for client to
> connect to ZooKeeper
> 1031 INFO  (zkCallback-3-thread-1-processing-n:172.31.11.63:8983_solr) [
> ] o.a.s.c.c.ConnectionManager Watcher
> org.apache.solr.common.cloud.ConnectionManager@1de3d8b2
> name:ZooKeeperConnection 

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hi Teague!

I configured the solrconf.xml and schema.xml exactly the way you did, only
substituting the word 'documentText' per 'content' used by the techproducts
sample, I reindex through :

 curl '
http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1=true'
-F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"

with the same result no highlight in the respond as below:

"highlighting": { "pdf1": {} }

=(

Really... do not know what to do...

Thanks for your time, if you have any more suggestion where I could be
missing something... please let me know.


Best regards,

*Evert*

2015-12-16 15:30 GMT-02:00 Teague James :

> Hi Evert,
>
> I recently needed help with phrase highlighting and was pointed to the
> FastVectorHighlighter which worked out great. I just made a change to the
> configuration to add generateWordParts="0" and generateNumberParts="0" so
> that searches for things like "1a" would get highlighted correctly. You may
> or may not need that feature. You can always remove them or change the
> value to "1" to switch them on explicitly. Anyway, hope this helps!
>
> solrconfig.xml (partial snip)
> 
> 
> xml
> explicit
> 10
> documentText
> on
> text
> true
> 100
> 
> 
> 
> 
>
> schema.xml (partial snip)
> required="true" multiValued="false" />
> multivalued="true" termVectors="true" termOffsets="true"
> termPositions="true" />
>
>  positionIncrementGap="100">
> 
> 
>  words="stopwords.txt" />
>  catenateAll="1" preserveOriginal="1" generateNumberParts="0"
> generateWordParts="0" />
>  synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
> 
> 
> 
>  catenateAll="1" preserveOriginal="1" generateWordParts="0" />
>  words="stopwords.txt" />
> 
> 
> 
> 
>
> -Teague
>
> From: Evert R. [mailto:evert.ra...@gmail.com]
> Sent: Tuesday, December 15, 2015 6:25 AM
> To: solr-user@lucene.apache.org
> Subject: Solr Basic Configuration - Highlight - Begginer
>
> Hi there!
>
> It´s my first installation, not sure if here is the right channel...
>
> Here is my steps:
>
> 1. Set up a basic install of solr 5.4.0
>
> 2. Create a new core through command line (bin/solr create -c test)
>
> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>
> 4. Query over the browser and it brings the correct search, but it does
> not show the part of the text I am querying, the highlight.
>
>   I have already flagled the 'hl' option. But still it does not word...
>
> Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
> matches for this word, it shows me the book name (pdf file) but does not
> bring which part of the text it has the word peace on it.
>
>
> I am problably missing some configuration in schema.xml, which is missing
> from my folder /solr/server/solr/test/conf/
>
> Or even the solrconfig.xml...
>
> I have read a bunch of things about highlight check these files, copied
> the standard schema.xml to my core/conf folder, but still it does not bring
> the highlight.
>
>
> Attached a copy of my solrconfig.xml file.
>
>
> I am very sorry for this, probably, dumb and too basic question... First
> time I see solr in live.
>
>
> Any help will be appreciated.
>
>
>
> Best regards,
>
>
> Evert Ramos
>
> mailto:evert.ra...@gmail.com
>
>
>


Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Erick Erickson
Ok, you're getting confused by all the options, an easy thing to do.
You're trying to do too many things at once without making sure
the basics work

1> Forget all about the f.content.hl stuff. That's there in case
you want to specify different parameters for different fields in the same
highlight request. That's an advanced option for later

2> start with the basic techproducts example. Then this should show
you hightlights:
q=features:accents=true=features

That's about as basic as you get. It's searching for "accents" in the
features field and returning highlights on the features field.

Once that's working, _then_ refine.

Best,
Erick

On Wed, Dec 16, 2015 at 8:21 AM, Evert R.  wrote:
> Hi Andrea,
>
> ok, let´s do it:
>
> 1. it does has the 'nietava' term, so it brings the only book (pdf file)
> has this word, and all its content as my previous message to Erick, so the
> content field is there.
>
> 2. using content:nietava it does not show any result as below:
>
> { "responseHeader": { "status": 400, "QTime": 12, "params": { "q":
> "contents:nietava", "indent": "true", "fl": "id", "wt": "json", "_":
> "1450282631352" } }, "error": { "msg": "undefined field contents", "code":
> 400 } }
>
> 3. Here is what I found when grepping 'content' from the techproducts conf
> folder:
>
> schema.xml:  stored="true" multiValued="true"/> schema.xml:  type="text_general" indexed="false" stored="true" multiValued="true"/>
> schema.xml:  schema.xml:
>  solrconfig.xml:  name="facet.field">content_type solrconfig.xml:  name="hl.fl">content features title name solrconfig.xml:  name="f.content.hl.snippets">3 solrconfig.xml:  name="f.content.hl.fragsize">200 solrconfig.xml:  name="f.content.hl.alternateField">content solrconfig.xml:  name="f.content.hl.maxAlternateFieldLength">750 solrconfig.xml:  name="stream.contentType">application/json solrconfig.xml:  name="stream.contentType">application/csv solrconfig.xml:  name="content-type">text/plain; charset=UTF-8
>
> and the grep on 'content_type':
>
> schema.xml:stored="true" multiValued="true"/>
> schema.xml:   
> solrconfig.xml:   content_type
>
> =)
>
> Thanks for checking out.
>
>
>
> *Evert *
>
> 2015-12-16 12:59 GMT-02:00 Andrea Gazzarini :
>
>> hl=f.content.hl.content (I guess) is definitely wrong. Some questions:
>>
>>- First, sorry, the obvious question: are you sure the documents contain
>>the "nietava" term?
>>- Could you try to use q=content:nietaval?
>>- Could you paste the definition (field & fieldtype) of the content
>>field?
>>
>> > Should I have this configuration in the XML file?
>>
>> You could, but it's up to you and it strongly depends on your context. The
>> simple thing is that if you have those parameters within the configuration
>> you can avoid to pass them (as part of the requests), but probably in this
>> phase, where you are testing, it's better to have them there (in the
>> request).
>>
>> Andrea
>>
>> 2015-12-16 15:28 GMT+01:00 Evert R. :
>>
>> > Hi Andrea,
>> >
>> > Thanks for the reply!
>> >
>> > I tried with the hl.fl parameter as well, using as below:
>> >
>> >
>> >
>> http://localhost:8983/solr/techproducts/select?q=nietava=id%2C+content=json=true=true;
>> >
>> >
>> hl.fl=f.content.hl.content%3D4=%3Cem%3E=%3C%2Fem%3E
>> >
>> > with the parameter under the hl field in the solr ui:
>> >
>> > 1. f.content.hl.snnipets=2
>> > 2. f.content.hl.content=4
>> > 3. content
>> >
>> > with no success...
>> >
>> > Should I have this configuration in the XML file?
>> >
>> > Regards,
>> >
>> > *Evert *
>> >
>> > 2015-12-16 11:23 GMT-02:00 Andrea Gazzarini :
>> >
>> > > Hi Evert,
>> > > what is the configuration of the default request handler? Did you set
>> the
>> > > hl.fl parameter?
>> > >
>> > > Please check here [1] the parameters that the highlighting component
>> > > expects. Required parameters should be in the query string or declared
>> > > within the request handler which answers to your query.
>> > >
>> > > Andrea
>> > >
>> > > [1] https://wiki.apache.org/solr/HighlightingParameters
>> > >
>> > >
>> > >
>> > >
>> > > 2015-12-16 12:51 GMT+01:00 Evert R. :
>> > >
>> > > > Hi everyone!
>> > > >
>> > > > I think I should not have posted my server name... never had that
>> many
>> > > > access attempts...
>> > > >
>> > > >
>> > > >
>> > > > 2015-12-16 9:03 GMT-02:00 Evert R. :
>> > > >
>> > > > > Hello Erick,
>> > > > >
>> > > > > Thanks again for your time.
>> > > > >
>> > > > > Here is as far as I have gone:
>> > > > >
>> > > > > 1. I started a fresh install and did the following:
>> > > > >
>> > > > > [evert@nix]$ bin/solr start -e techproducts
>> > > > > [evert@nix]$ curl '
>> > > > >
>> > > >
>> > >
>> >
>> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1=true
>> > > > '
>> > > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
>> > > > >
>> 

Re: Solr High Availability

2015-12-16 Thread Upayavira
If you have two replicas (one leader/one replica) for each shard of your
collection, and you ensure that no two replicas are on the same node,
and you have three independent Zookeeper nodes, then yes, you should
have HA.

Upayavira

On Wed, Dec 16, 2015, at 05:48 PM, Peter Tan wrote:
> Hi Jack,
> 
> Appreciate you helping me to clear this up.
> 
> For replicationFactor = 1, that means only keeping one copy of document
> in
> the cluster.
> 
> Currently, for our SolrCloud setup, we have two replicas (primary and
> replica) per each shard (total of 5 shards).  This should achieve the HA
> already, correct?
> 
> 
> 
> On Tue, Dec 15, 2015 at 10:09 PM, Jack Krupansky
> 
> wrote:
> 
> > There is no HA with a single replica for each shard. Replication factor
> > must be at least 2 for HA.
> >
> > -- Jack Krupansky
> >
> > On Wed, Dec 16, 2015 at 12:38 AM, Peter Tan  wrote:
> >
> > > Hi Jack, What happens when there is only one replica setup?
> > >
> > > On Tue, Dec 15, 2015 at 9:32 PM, Jack Krupansky <
> > jack.krupan...@gmail.com>
> > > wrote:
> > >
> > > > Solr Cloud provides HA when you configure at least two replicas for
> > each
> > > > shard and have at least 3 zookeepers. That's it. No deck or detail
> > > document
> > > > is needed.
> > > >
> > > >
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > On Tue, Dec 15, 2015 at 9:07 PM, 
> > > > wrote:
> > > >
> > > > > Hi Team,
> > > > >
> > > > > Can you help me in understanding in achieving the Solr High
> > > Availability
> > > > .
> > > > >
> > > > > Appreciate you have a detail document or Deck on more details.
> > > > >
> > > > > Thank you
> > > > > Viswanath Bharathi
> > > > > Accenture | Delivery Centres for Technology in India
> > > > > CDC 2, Chennai, India
> > > > > Mobile: +91 9886259010
> > > > > www.accenture.com | www.avanade.com<
> > > > > http://www.avanade.com/>
> > > > >
> > > > >
> > > > > 
> > > > >
> > > > > This message is for the designated recipient only and may contain
> > > > > privileged, proprietary, or otherwise confidential information. If
> > you
> > > > have
> > > > > received it in error, please notify the sender immediately and delete
> > > the
> > > > > original. Any other use of the e-mail by you is prohibited. Where
> > > allowed
> > > > > by local law, electronic communications with Accenture and its
> > > > affiliates,
> > > > > including e-mail and instant messaging (including content), may be
> > > > scanned
> > > > > by our systems for the purposes of information security and
> > assessment
> > > of
> > > > > internal compliance with Accenture policy.
> > > > >
> > > > >
> > > >
> > >
> > __
> > > > >
> > > > > www.accenture.com
> > > > >
> > > >
> > >
> >


Re: Solr High Availability

2015-12-16 Thread Peter Tan
Thanx for the response.

There were few occurrences of our SolrCloud cluster where when a primary
went down in a shard, the replica didn't get promoted which eventually led
to downtime.  We had to restart zookeeper services (we have three zookeeper
nodes) to promote the replica into primary.

But I just want to make sure our setup is correct

On Wed, Dec 16, 2015 at 10:01 AM, Upayavira  wrote:

> If you have two replicas (one leader/one replica) for each shard of your
> collection, and you ensure that no two replicas are on the same node,
> and you have three independent Zookeeper nodes, then yes, you should
> have HA.
>
> Upayavira
>
> On Wed, Dec 16, 2015, at 05:48 PM, Peter Tan wrote:
> > Hi Jack,
> >
> > Appreciate you helping me to clear this up.
> >
> > For replicationFactor = 1, that means only keeping one copy of document
> > in
> > the cluster.
> >
> > Currently, for our SolrCloud setup, we have two replicas (primary and
> > replica) per each shard (total of 5 shards).  This should achieve the HA
> > already, correct?
> >
> >
> >
> > On Tue, Dec 15, 2015 at 10:09 PM, Jack Krupansky
> > 
> > wrote:
> >
> > > There is no HA with a single replica for each shard. Replication factor
> > > must be at least 2 for HA.
> > >
> > > -- Jack Krupansky
> > >
> > > On Wed, Dec 16, 2015 at 12:38 AM, Peter Tan 
> wrote:
> > >
> > > > Hi Jack, What happens when there is only one replica setup?
> > > >
> > > > On Tue, Dec 15, 2015 at 9:32 PM, Jack Krupansky <
> > > jack.krupan...@gmail.com>
> > > > wrote:
> > > >
> > > > > Solr Cloud provides HA when you configure at least two replicas for
> > > each
> > > > > shard and have at least 3 zookeepers. That's it. No deck or detail
> > > > document
> > > > > is needed.
> > > > >
> > > > >
> > > > >
> > > > > -- Jack Krupansky
> > > > >
> > > > > On Tue, Dec 15, 2015 at 9:07 PM, <
> k.viswanath.bhara...@accenture.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Team,
> > > > > >
> > > > > > Can you help me in understanding in achieving the Solr High
> > > > Availability
> > > > > .
> > > > > >
> > > > > > Appreciate you have a detail document or Deck on more details.
> > > > > >
> > > > > > Thank you
> > > > > > Viswanath Bharathi
> > > > > > Accenture | Delivery Centres for Technology in India
> > > > > > CDC 2, Chennai, India
> > > > > > Mobile: +91 9886259010
> > > > > > www.accenture.com | www.avanade.com<
> > > > > > http://www.avanade.com/>
> > > > > >
> > > > > >
> > > > > > 
> > > > > >
> > > > > > This message is for the designated recipient only and may contain
> > > > > > privileged, proprietary, or otherwise confidential information.
> If
> > > you
> > > > > have
> > > > > > received it in error, please notify the sender immediately and
> delete
> > > > the
> > > > > > original. Any other use of the e-mail by you is prohibited. Where
> > > > allowed
> > > > > > by local law, electronic communications with Accenture and its
> > > > > affiliates,
> > > > > > including e-mail and instant messaging (including content), may
> be
> > > > > scanned
> > > > > > by our systems for the purposes of information security and
> > > assessment
> > > > of
> > > > > > internal compliance with Accenture policy.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> __
> > > > > >
> > > > > > www.accenture.com
> > > > > >
> > > > >
> > > >
> > >
>


Re: SolrCloud 4.8.1 - commit wait

2015-12-16 Thread Erick Erickson
Quick scan, but probably this:
 INFO
 o.a.solr.spelling.suggest.Suggester - build()

The suggester build process can easily take many minutes, there's some
explanation here:
https://lucidworks.com/blog/2015/03/04/solr-suggester/

the short form is that depending on how it's defined, it may have to
read _all_ the
documents in your entire corpus to build the suggester structures. And
you apparently
have buildOnCommit set to true.

Note particularly the caveats there about the Solr version required so that
buildOnStartup=false is honored.

Best,
Erick

On Wed, Dec 16, 2015 at 2:34 AM, Vincenzo D'Amore  wrote:
> Hi,
>
> an update. Hope you can help me.
>
> I have stopped all the other working collections, in order to have a clean
> log file.
>
> at 11:01:16 an hard commit has been issued
>
> 2015-12-16 11:01:49,839 [http-bio-8080-exec-824] INFO
>  org.apache.solr.update.UpdateHandler - start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>
> at 11:11:31,344 the commit has been completed.
>
> The commit was ended logging this line, I suppose 615021 is the wait time
> (roughly 10 minutes) :
>
> 2015-12-16 11:11:31,343 [http-bio-8080-exec-991] INFO
>  o.a.s.u.processor.LogUpdateProcessor - [catalogo_shard2_replica3]
> webapp=/solr path=/update
> params={waitSearcher=true=true=false=javabin=2}
> {commit=} 0 615021
>
> During this 10 minutes, the server logged "only" thes lines, looking at
> them I don't see anything of strange:
>
> 2015-12-16 11:01:50,705 [http-bio-8080-exec-824] INFO
>  o.a.solr.search.SolrIndexSearcher - Opening
> Searcher@6d5c31e2[catalogo_shard1_replica2]
> main
> 2015-12-16 11:01:50,724 [http-bio-8080-exec-824] INFO
>  org.apache.solr.update.UpdateHandler - end_commit_flush
> 2015-12-16 11:02:20,722 [searcherExecutor-108-thread-1] INFO
>  o.a.solr.spelling.suggest.Suggester - build()
> 2015-12-16 11:02:21,846 [http-bio-8080-exec-824] INFO
>  o.a.s.u.processor.LogUpdateProcessor - [catalogo_shard1_replica2]
> webapp=/solr path=/update
> params={update.distrib=FROMLEADER=true=true=true=false=
> http://192.168.101.118:8080/solr/catalogo_shard2_replica3/_end_point=true=javabin=2=false}
> {commit=} 0 32007
> 2015-12-16 11:05:47,162 [http-bio-8080-exec-1037] INFO
>  org.apache.solr.update.UpdateHandler - start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2015-12-16 11:05:47,970 [http-bio-8080-exec-1037] INFO
>  o.a.solr.search.SolrIndexSearcher - Opening
> Searcher@4ede7ac5[catalogo_shard2_replica3]
> main
> 2015-12-16 11:05:47,989 [http-bio-8080-exec-1037] INFO
>  org.apache.solr.update.UpdateHandler - end_commit_flush
> 2015-12-16 11:06:03,063 [commitScheduler-115-thread-1] INFO
>  org.apache.solr.update.UpdateHandler - start
> commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2015-12-16 11:06:03,896 [commitScheduler-115-thread-1] INFO
>  o.a.solr.search.SolrIndexSearcher - Opening
> Searcher@2bf4fd3a[catalogo_shard3_replica1]
> realtime
> 2015-12-16 11:06:03,913 [commitScheduler-115-thread-1] INFO
>  org.apache.solr.update.UpdateHandler - end_commit_flush
> 2015-12-16 11:06:19,435 [searcherExecutor-111-thread-1] INFO
>  o.a.solr.spelling.suggest.Suggester - build()
> 2015-12-16 11:06:20,589 [http-bio-8080-exec-1037] INFO
>  o.a.s.u.processor.LogUpdateProcessor - [catalogo_shard2_replica3]
> webapp=/solr path=/update
> params={update.distrib=FROMLEADER=true=true=true=false=
> http://192.168.101.118:8080/solr/catalogo_shard2_replica3/_end_point=true=javabin=2=false}
> {commit=} 0 33427
> 2015-12-16 11:08:07,076 [http-bio-8080-exec-1037] INFO
>  org.apache.solr.update.UpdateHandler - start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2015-12-16 11:08:07,076 [http-bio-8080-exec-1037] INFO
>  org.apache.solr.update.UpdateHandler - No uncommitted changes. Skipping
> IW.commit.
> 2015-12-16 11:08:07,076 [http-bio-8080-exec-1037] INFO
>  o.a.solr.search.SolrIndexSearcher - Opening
> Searcher@75b2727f[catalogo_shard3_replica1]
> main
> 2015-12-16 11:08:07,084 [http-bio-8080-exec-1037] INFO
>  org.apache.solr.update.UpdateHandler - end_commit_flush
> 2015-12-16 11:08:39,040 [searcherExecutor-114-thread-1] INFO
>  o.a.solr.spelling.suggest.Suggester - build()
> 2015-12-16 11:08:40,286 [http-bio-8080-exec-1037] INFO
>  o.a.s.u.processor.LogUpdateProcessor - [catalogo_shard3_replica1]
> webapp=/solr path=/update
> params={update.distrib=FROMLEADER=true=true=true=false=
> http://192.168.101.118:8080/solr/catalogo_shard2_replica3/_end_point=true=javabin=2=false}
> {commit=} 0 33211
>
> Could some component be the cause of this wait? Something like a suggester
> or a spellchecker cache?
> But if yes, I should see the activity in log file, isn't it?
>
> Best regards,
> Vincenzo
>
>
> On Sat, Dec 12, 

Re: Solr High Availability

2015-12-16 Thread Peter Tan
Hi Jack,

Appreciate you helping me to clear this up.

For replicationFactor = 1, that means only keeping one copy of document in
the cluster.

Currently, for our SolrCloud setup, we have two replicas (primary and
replica) per each shard (total of 5 shards).  This should achieve the HA
already, correct?



On Tue, Dec 15, 2015 at 10:09 PM, Jack Krupansky 
wrote:

> There is no HA with a single replica for each shard. Replication factor
> must be at least 2 for HA.
>
> -- Jack Krupansky
>
> On Wed, Dec 16, 2015 at 12:38 AM, Peter Tan  wrote:
>
> > Hi Jack, What happens when there is only one replica setup?
> >
> > On Tue, Dec 15, 2015 at 9:32 PM, Jack Krupansky <
> jack.krupan...@gmail.com>
> > wrote:
> >
> > > Solr Cloud provides HA when you configure at least two replicas for
> each
> > > shard and have at least 3 zookeepers. That's it. No deck or detail
> > document
> > > is needed.
> > >
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > On Tue, Dec 15, 2015 at 9:07 PM, 
> > > wrote:
> > >
> > > > Hi Team,
> > > >
> > > > Can you help me in understanding in achieving the Solr High
> > Availability
> > > .
> > > >
> > > > Appreciate you have a detail document or Deck on more details.
> > > >
> > > > Thank you
> > > > Viswanath Bharathi
> > > > Accenture | Delivery Centres for Technology in India
> > > > CDC 2, Chennai, India
> > > > Mobile: +91 9886259010
> > > > www.accenture.com | www.avanade.com<
> > > > http://www.avanade.com/>
> > > >
> > > >
> > > > 
> > > >
> > > > This message is for the designated recipient only and may contain
> > > > privileged, proprietary, or otherwise confidential information. If
> you
> > > have
> > > > received it in error, please notify the sender immediately and delete
> > the
> > > > original. Any other use of the e-mail by you is prohibited. Where
> > allowed
> > > > by local law, electronic communications with Accenture and its
> > > affiliates,
> > > > including e-mail and instant messaging (including content), may be
> > > scanned
> > > > by our systems for the purposes of information security and
> assessment
> > of
> > > > internal compliance with Accenture policy.
> > > >
> > > >
> > >
> >
> __
> > > >
> > > > www.accenture.com
> > > >
> > >
> >
>


Re: warning while indexing

2015-12-16 Thread Alexandre Rafalovitch
Are you sending documents from one client or many?

Looks like an exhaustion of some sort of pool related to Commit within,
which I assume you are using.

Regards,
Alex
On 16 Dec 2015 4:11 pm, "Midas A"  wrote:

> Getting following warning while indexing ..Anybody please tell me the
> reason .
>
>
> java.util.concurrent.RejectedExecutionException: Task
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@9916a67
> rejected from java.util.concurrent.ScheduledThreadPoolExecutor@79f8b5f
> [Terminated,
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
> 2046]
> at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
> at
> org.apache.solr.update.CommitTracker._scheduleCommitWithin(CommitTracker.java:150)
> at
> org.apache.solr.update.CommitTracker._scheduleCommitWithinIfNeeded(CommitTracker.java:118)
> at
> org.apache.solr.update.CommitTracker.addedDocument(CommitTracker.java:169)
> at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:231)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
> at
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
> at
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
> at
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:500)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:353)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:219)
> at
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:451)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:489)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
>


Re: JVM error v ~StubRoutines::jbyte_disjoint_arraycopy

2015-12-16 Thread Erick Erickson
https://wiki.apache.org/lucene-java/JavaBugs

See the last entry in the OpenJDK section, you're using one of the
Java versions that has issues. So the first thing I'd try is up
grading my JVM.

Best,
Erick

On Wed, Dec 16, 2015 at 2:01 PM, abhayd  wrote:
> hi
>
> I have more than 50Gb in /tmp index size is 6GB
> I still get the same error
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/JVM-error-v-StubRoutines-jbyte-disjoint-arraycopy-tp4244603p4245886.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Erick Erickson
I think you're still missing the critical bit. Highlighting is
completely separate from searching. In other words, you can search on
one field and highlight another. What field is searched is governed by
the "qf" parameter when using edismax and by the the "df" parameter
configured in your request handler in solrconfig.xml. These defaults
are overridden when you do a "fielded search" like

q=content:nietava

So this: q=content:nietava=true=content
is searching the "content" field. The word you're looking for isn't in
the content field so naturally no docs are returned. And no
highlighting either.

This: q=nietava=true=content

is searching somewhere else, thus getting the hit. We already know
that "nietava" is not in the content field because the first search
failed. You need to find out what field is being matched (probably
something like "text") and then try highlighting on _that_ field. Try
adding "debug=query" to the URL and look at the "parsed_query" section
of the return and you'll see what field(s) is/are actually being
searched against.

NOTE: The field you highlight on _must_ have stored="true" in schema.xml.

As to why "nietava" isn't being found in the content field, probably
you have some kind of analysis chain configured for that field that
isn't searching as you expect. See the admin/analysis page for some
insight into why that would be. The most frequent reason is that the
field is a "string" type which is not broken up into words. Another
possibility is that your analysis chain is leaving in the quotes or
something similar. As James says, looking at admin/analysis is a good
way to figure this out.

I still strongly recommend you go from the stock techproducts example
and get familiar with how Solr (and highlighting) work before jumping
in and changing things. There are a number of ways things can be
mis-configured and trying to change several things at once is a fine
way to go mad. The admin UI>>schema browser is another way you can see
what kind of terms are _actually_ in your index in a particular field.

Best,
Erick




On Wed, Dec 16, 2015 at 12:26 PM, Teague James  wrote:
> Sorry to hear that didn't work! Let me ask a couple of questions...
>
> Have you tried the analyzer inside of the Admin Interface? It has helped me 
> sort out a number of highlighting issues in the past. To access it, go to 
> your Admin interface, select your core, then select Analysis from the list of 
> options on the left. In the analyzer, enter the term you are indexing in the 
> top left (in other words the term in the document you are indexing that you 
> expect to get a hit on) and right input fields. Select the field that it is 
> destined for (in your case that would be 'content'), then hit analyze. Helps 
> if you have a big screen!
>
> This will show you the impact of the various filter factories that you have 
> engaged and their effect on whether or not a 'hit' is being generated. Hits 
> are idietified by a very feint highlight. (PSST... Developers... It would be 
> really cool if the highlight color were more visible or customizable... 
> Thanks y'all) If it looks like you're getting hits, but not getting 
> highlighting, then open up a new tab with the Admin's query interface. Same 
> place on the left as the analyzer. Replace the "*:*" with your search term 
> (assuming you already indexed your document) and if necessary you can put 
> something in the FQ like "id:123456" to target a specific record.
>
> Did you get a hit? If no, then it's not highlighting that's the issue. If 
> yes, then try dumping this in your address bar (using your URL/IP, search 
> term, and core name of course. The fq= is an example) :
> http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456="[SEARCH-TERM];
>
> That will dump Solr's output to your browser where you can see exactly what 
> is getting hit.
>
> Hope that helps! Let me know how it goes. Good luck.
>
> -Teague
>
> -Original Message-
> From: Evert R. [mailto:evert.ra...@gmail.com]
> Sent: Wednesday, December 16, 2015 1:46 PM
> To: solr-user 
> Subject: Re: Solr Basic Configuration - Highlight - Begginer
>
> Hi Teague!
>
> I configured the solrconf.xml and schema.xml exactly the way you did, only 
> substituting the word 'documentText' per 'content' used by the techproducts 
> sample, I reindex through :
>
>  curl '
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1=true'
> -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
>
> with the same result no highlight in the respond as below:
>
> "highlighting": { "pdf1": {} }
>
> =(
>
> Really... do not know what to do...
>
> Thanks for your time, if you have any more suggestion where I could be 
> missing something... please let me know.
>
>
> Best regards,
>
> *Evert*
>
> 2015-12-16 15:30 GMT-02:00 Teague James :
>
>> Hi Evert,
>>
>> I recently needed help with phrase highlighting and was pointed to the

RE: DIH Caching w/ BerkleyBackedCache

2015-12-16 Thread Dyer, James
Todd,

I have no idea if this will perform acceptable with so many multiple values.  I 
doubt the solr/patch code was really optimized for such a use case.  In my 
production environment, I have je-6.2.31.jar on the classpath.  I don't think 
I've tried it with other versions.

James Dyer
Ingram Content Group

-Original Message-
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Wednesday, December 16, 2015 10:21 AM
To: solr-user@lucene.apache.org
Subject: RE: DIH Caching w/ BerkleyBackedCache

James,

I apologize for the late response.


Dyer, James-2 wrote
> With the DIH request, are you specifying "cacheDeletePriorData=false"

We are not specifying that property (it looks like it defaults to "false").
I'm actually seeing this issue when running a full clean/import.

It appears that the Berkeley DB "cleaner" is always removing the oldest file
once there are three. In this case, I'll see two 1GB files and then as the
third file is being written (after ~200MB) the oldest 1GB file will fall off
(i.e. get deleted). I'm only utilizing ~13% disk space at the time. I'm
using Berkeley DB version 4.1.6 with Solr 4.8.1. I'm not specifying any
other configuration properties other than what I mentioned before. I simply
cannot figure out what is going on with the "cleaner" logic that would deem
that file "lowest utilized". Any other Berkeley DB/system configuration I
could consider that would affect this?

It's possible that this caching simply might not be suitable for our data
set where one document might contain a field with tens of thousands of
values... maybe this is the bottleneck with using this database as every add
copies in the prior data and then the "cleaner" removes the old stuff. Maybe
it's working like it should but just incredibly slow... I can get a full
index without caching in about two hours, however, when using this caching
it was still running after 24 hours (still caching the sub-entity).

Thanks again for the reply.

Respectfully,
Todd



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4245777.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: query to get parents without childs

2015-12-16 Thread Novin

"Index the number of children into the parent as an integer" is nice and easy 
solution. But I would like to know about"

You could probably do that inside an UpdateProcessor, even using the
Javascript ScriptUpdateProcessor. Probably simpler though in the code
that pushes the docs to Solr." either.
Can you point me to any documentation related to above, would be very helpful.

Thanks


On 16/12/2015 21:52, Upayavira wrote:

So that's a good question - how do you identify parent documents that
*do not* have child documents.

I'm not sure how you would do that. However, you could index the number
of children into the parent as an integer, then it would be easy.

You could probably do that inside an UpdateProcessor, even using the
Javascript ScriptUpdateProcessor. Probably simpler though in the code
that pushes the docs to Solr.

Upayavira

On Wed, Dec 16, 2015, at 09:05 PM, Novin Novin wrote:

Hi Scott,

Actually, it is not multi value field. it is nested document.

Novin

On 16 December 2015 at 20:33, Scott Stults <
sstu...@opensourceconnections.com> wrote:


Hi Novin,

How are you associating parents with children? Is it a "children"
multivalued field in the parent record? If so you could query for records
that don't have a value in that field like "-children:[* TO *]"

k/r,
Scott

On Wed, Dec 16, 2015 at 7:29 AM, Novin Novin  wrote:


Hi guys,

I have few parent index without child, what would wold be the query for
those to get?

Thanks,
Novin




--
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com





Re: query to get parents without childs

2015-12-16 Thread Upayavira
So that's a good question - how do you identify parent documents that
*do not* have child documents.

I'm not sure how you would do that. However, you could index the number
of children into the parent as an integer, then it would be easy.

You could probably do that inside an UpdateProcessor, even using the
Javascript ScriptUpdateProcessor. Probably simpler though in the code
that pushes the docs to Solr.

Upayavira

On Wed, Dec 16, 2015, at 09:05 PM, Novin Novin wrote:
> Hi Scott,
> 
> Actually, it is not multi value field. it is nested document.
> 
> Novin
> 
> On 16 December 2015 at 20:33, Scott Stults <
> sstu...@opensourceconnections.com> wrote:
> 
> > Hi Novin,
> >
> > How are you associating parents with children? Is it a "children"
> > multivalued field in the parent record? If so you could query for records
> > that don't have a value in that field like "-children:[* TO *]"
> >
> > k/r,
> > Scott
> >
> > On Wed, Dec 16, 2015 at 7:29 AM, Novin Novin  wrote:
> >
> > > Hi guys,
> > >
> > > I have few parent index without child, what would wold be the query for
> > > those to get?
> > >
> > > Thanks,
> > > Novin
> > >
> >
> >
> >
> > --
> > Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> > | 434.409.2780
> > http://www.opensourceconnections.com
> >


Strange debug output for a slow query

2015-12-16 Thread Shawn Heisey
Here is the query URL that I did.  The info included in this message is
slightly redacted.

http://bigindy5.REDACTED.com:8982/solr/sparkmain/search?q=%28german+shepherd%29=/search=0=NOT%28feature:redact1+OR+feature:spkhistorical%29=%28ip:%28AP%29+AND+price:0%29+OR+%28ip:%28BB%29%29+OR+%28ip:%28COR%29+AND+price:0%29+OR+%28ip:%28GET%29+AND+%28collection:subscription+OR+collection:editorialsubscription%29%29+OR+%28ip:%28PA%29%29+OR+%28ip:%28RTR%29%29+OR+%28ip:%28RX%29+AND+price:0%29+OR+%28ip:%28USAT%29+AND+price:0%29+OR+%28ip:%28AFP%29%29+OR+%28ip:%28GET%29+AND+NOT+collection:subscription+AND+NOT+collection:editorialsubscription%29+OR+%28ip:%28RX%29%29=restr:%28%28worldwide+OR+none+OR+aus_i%29+AND+NOT+%28aus_x%29%29=doc_date:[1900-01-01T00:00:00Z+TO+2015-12-12T00:00:00Z]=post_date+desc=75=true=true=json

Sent to a set of production servers running 4.9.1 (when the caches are
cold), this takes about 7 seconds.  Sent to a 5.3.2-SNAPSHOT dev server
with cold caches, it takes about 15 seconds -- because that server is
particularly low on memory.  Once the query is cached, it takes 100
milliseconds or less, even on the dev server.

Checking one of the shard indexes with the schema browser, ip has 34
unique terms, feature has 108 unique terms, collection has 824 unique
terms, restr has 128 unique terms.  As expected, doc_date has about 12
million unique terms for the shard.  It is a TrieDateField with a
precisionStep of 16.  The rest of the shards have similar unique term
counts.  The entire sharded index has 244 million documents in it - six
shards with 40.6 million each and one shard with under 500K documents.

I've been trying to figure out why this query is so slow.  I can't see
anything obvious, but I did encounter something really weird in the
debug output.

This is the params section of the response -- as you can see, echoParams
is set to all, and you can see the shards parameter defined in
solrconfig.xml:

http://apaste.info/pGe

This is the filter query info from the debug -- showing the same set of
filters seven times, which I assume is because there are seven shards. 
I do not know if this is a debug glitch.  The response info here is from
the dev server, but the production servers give the same info:

http://apaste.info/HpC

Does anyone have thoughts about the repeated filter information in the
debug output, or why it takes several seconds for this query to run?

General performance on the production index is pretty good.  Over the
last 9800 queries, the production server has a median qtime of 238ms and
a 95th percentile of 3672ms.  The query rate is less than one per second.

Thanks,
Shawn



Re: Append fields to a document

2015-12-16 Thread Alexandre Rafalovitch
If you enable LazyLoading and do not request them in your 'fl' list,
they should be mostly just size on disk AFAIK.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 17 December 2015 at 08:09, Jamie Johnson  wrote:
> The expense is in gathering the pieces to do the indexing.  There isn't
> much that I can do in that regard unfortunately.  I need to investigate
> storing the fields, if they aren't returned is the expense just size on
> disk or is there a memory cost as well?
> On Dec 16, 2015 7:43 PM, "Alexandre Rafalovitch"  wrote:
>
>> ExternalFileField might be useful in some situations.
>>
>> But also, is it possible that your Solr schema configuration is not
>> best suited for your domain? Is it - for example - possible that the
>> additional data should be in child records?
>>
>> Pure guesswork here, not enough information. But, as described, Solr
>> will not be able to fulfill your needs easily. Something will need to
>> change.
>>
>> Regards,
>>Alex.
>>
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 16 December 2015 at 22:09, Jamie Johnson  wrote:
>> > I have a use case where we only need to append some fields to a document.
>> > To retrieve the full representation is very expensive but I can easily
>> get
>> > the deltas.  Is it possible to just add fields to an existing Solr
>> > document?  I experimented with using overwrite=false, but that resulted
>> in
>> > two documents with the same uniqueKey in the index (which makes sense).
>> Is
>> > there a way to accomplish what I'm looking to do in Solr?  My fields
>> aren't
>> > all stored and think it will be too expensive for me to make that change.
>> > Any thoughts would be really appreciated.
>>


Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Erick Erickson
bq: but when highlight, using the text field...nothing comes up...

http://localhost:8983/solr/techproducts/select?q=text:nietava=id:pdf1=json=true=true=text=%3Cem%3E=%3C%2Fem%3E

It's unclear what this means. No results showed up (i.e. numFound==0)
or no highlighting showed up? Assuming that
1> the "text" field has stored=true and
2> you find documents when searching on the "text" field
the above should show something in the highlights section.

Please take the time to provide complete details. Guessing what you're
doing is wasting time, mine and yours. Once more:
1> what is the schema definition for the "text" field. Include the
fieldType definition
2> What is the result of adding =query to the field when you
don't get highlights

You might review: http://wiki.apache.org/solr/UsingMailingLists
because it's becoming quite frustrating that you give us little bits
of information that leave us guessing what you're _really_ doing.
Highlighting is working for lots of people in lots of sites, it's not
likely that this functionality is completely broken so the answer will
be in the docs.

Best,
ERick

On Wed, Dec 16, 2015 at 5:54 PM, Evert R.  wrote:
> Hi Erick and Teague,
>
>
> I found that when using the field 'text' it shows the pdf file result
> id:pdf1 in this case, like:
>
> http://localhost:8983/solr/techproducts/select?fq=id:pdf1=nietava
>
> but when highlight, using the text field...nothing comes up...
>
> http://localhost:8983/solr/techproducts/select?q=text:nietava=id:pdf1=json=true=true=text=%3Cem%3E=%3C%2Fem%3E
>
> of even with the option
>
> f.text.hl.snippets=2 under the hl.fl field.
>
>
> I tried as well with the standard configuration, did it all over, reindexed
> a couple times... and still did not work.
>
> Also,
>
> Using the Analysis, it brings below information:
>
> ST
> textraw_bytesstartendpositionLengthtypeposition
> nietava[6e 69 65 74 61 76 61]0711
> SF
> textraw_bytesstartendpositionLengthtypeposition
> nietava[6e 69 65 74 61 76 61]0711
> LCF
> textraw_bytesstartendpositionLengthtypeposition
> nietava[6e 69 65 74 61 76 61]0711
>
>
> Alphanumeric I think... so, it´s 'string', right? would that be a problem?
> Should be some other indication?
>
>
> Thanks again!
>
>
> *Evert*
>
> 2015-12-16 21:09 GMT-02:00 Erick Erickson :
>
>> I think you're still missing the critical bit. Highlighting is
>> completely separate from searching. In other words, you can search on
>> one field and highlight another. What field is searched is governed by
>> the "qf" parameter when using edismax and by the the "df" parameter
>> configured in your request handler in solrconfig.xml. These defaults
>> are overridden when you do a "fielded search" like
>>
>> q=content:nietava
>>
>> So this: q=content:nietava=true=content
>> is searching the "content" field. The word you're looking for isn't in
>> the content field so naturally no docs are returned. And no
>> highlighting either.
>>
>> This: q=nietava=true=content
>>
>> is searching somewhere else, thus getting the hit. We already know
>> that "nietava" is not in the content field because the first search
>> failed. You need to find out what field is being matched (probably
>> something like "text") and then try highlighting on _that_ field. Try
>> adding "debug=query" to the URL and look at the "parsed_query" section
>> of the return and you'll see what field(s) is/are actually being
>> searched against.
>>
>> NOTE: The field you highlight on _must_ have stored="true" in schema.xml.
>>
>> As to why "nietava" isn't being found in the content field, probably
>> you have some kind of analysis chain configured for that field that
>> isn't searching as you expect. See the admin/analysis page for some
>> insight into why that would be. The most frequent reason is that the
>> field is a "string" type which is not broken up into words. Another
>> possibility is that your analysis chain is leaving in the quotes or
>> something similar. As James says, looking at admin/analysis is a good
>> way to figure this out.
>>
>> I still strongly recommend you go from the stock techproducts example
>> and get familiar with how Solr (and highlighting) work before jumping
>> in and changing things. There are a number of ways things can be
>> mis-configured and trying to change several things at once is a fine
>> way to go mad. The admin UI>>schema browser is another way you can see
>> what kind of terms are _actually_ in your index in a particular field.
>>
>> Best,
>> Erick
>>
>>
>>
>>
>> On Wed, Dec 16, 2015 at 12:26 PM, Teague James 
>> wrote:
>> > Sorry to hear that didn't work! Let me ask a couple of questions...
>> >
>> > Have you tried the analyzer inside of the Admin Interface? It has helped
>> me sort out a number of highlighting issues in the past. To access it, go
>> to your Admin interface, select your core, then select Analysis from the
>> list of options on the left. In the analyzer, enter the 

Re: warning while indexing

2015-12-16 Thread Midas A
Alexandre ,

we are running multiple  DIH to index data.

On Thu, Dec 17, 2015 at 12:40 AM, Alexandre Rafalovitch 
wrote:

> Are you sending documents from one client or many?
>
> Looks like an exhaustion of some sort of pool related to Commit within,
> which I assume you are using.
>
> Regards,
> Alex
> On 16 Dec 2015 4:11 pm, "Midas A"  wrote:
>
> > Getting following warning while indexing ..Anybody please tell me the
> > reason .
> >
> >
> > java.util.concurrent.RejectedExecutionException: Task
> >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@9916a67
> > rejected from java.util.concurrent.ScheduledThreadPoolExecutor@79f8b5f
> > [Terminated,
> > pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
> > 2046]
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
> > at
> >
> org.apache.solr.update.CommitTracker._scheduleCommitWithin(CommitTracker.java:150)
> > at
> >
> org.apache.solr.update.CommitTracker._scheduleCommitWithinIfNeeded(CommitTracker.java:118)
> > at
> >
> org.apache.solr.update.CommitTracker.addedDocument(CommitTracker.java:169)
> > at
> >
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:231)
> > at
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
> > at
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
> > at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
> > at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
> > at
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
> > at
> >
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
> > at
> > org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
> > at
> >
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
> > at
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:500)
> > at
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
> > at
> >
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:353)
> > at
> >
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:219)
> > at
> >
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:451)
> > at
> >
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:489)
> > at
> >
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
> >
>


Re: warning while indexing

2015-12-16 Thread Alexandre Rafalovitch
Ah. Then it might be that DIH cannot be run in parallel. Though the
exception is much lower in the stack.

Not sure. Maybe somebody else with more knowledge in the commit path can
comment on it.
On 17 Dec 2015 12:21 pm, "Midas A"  wrote:

> Alexandre,
>
> *Only two DIH, indexing different data.  *
>
> On Thu, Dec 17, 2015 at 10:46 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> wrote:
>
> > How many? On the same node?
> >
> > I am not sure if running multiple DIH is a popular case.
> >
> > My theory, still, that you are running out of a pool size there. Though
> if
> > it happens with even just two DIH, it could be a different issue.
> > On 17 Dec 2015 12:01 pm, "Midas A"  wrote:
> >
> > > Alexandre ,
> > >
> > > we are running multiple  DIH to index data.
> > >
> > > On Thu, Dec 17, 2015 at 12:40 AM, Alexandre Rafalovitch <
> > > arafa...@gmail.com>
> > > wrote:
> > >
> > > > Are you sending documents from one client or many?
> > > >
> > > > Looks like an exhaustion of some sort of pool related to Commit
> within,
> > > > which I assume you are using.
> > > >
> > > > Regards,
> > > > Alex
> > > > On 16 Dec 2015 4:11 pm, "Midas A"  wrote:
> > > >
> > > > > Getting following warning while indexing ..Anybody please tell me
> the
> > > > > reason .
> > > > >
> > > > >
> > > > > java.util.concurrent.RejectedExecutionException: Task
> > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@9916a67
> > > > > rejected from
> > java.util.concurrent.ScheduledThreadPoolExecutor@79f8b5f
> > > > > [Terminated,
> > > > > pool size = 0, active threads = 0, queued tasks = 0, completed
> tasks
> > =
> > > > > 2046]
> > > > > at
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
> > > > > at
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
> > > > > at
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
> > > > > at
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.update.CommitTracker._scheduleCommitWithin(CommitTracker.java:150)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.update.CommitTracker._scheduleCommitWithinIfNeeded(CommitTracker.java:118)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.update.CommitTracker.addedDocument(CommitTracker.java:169)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:231)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
> > > > > at
> > > > >
> > >
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:500)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:353)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:219)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:451)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:489)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
> > > > >
> > > >
> > >
> >
>


Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-16 Thread William Bell
Same question here

Wondering if faceting performance is fixed and how to take advantage of it ?

On Wed, Dec 16, 2015 at 2:57 AM, Vincenzo D'Amore 
wrote:

> Hi all,
>
> given that solr 5.4 is finally released, is this what's more stable and
> efficient version of solrcloud ?
>
> I have a website which receives many search requests. It serve normally
> about 2000 concurrent requests, but sometime there are peak from 4000 to
> 1 requests in few seconds.
>
> On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster to
> a new brand version, but following this thread I read about the problems
> that can occur upgrading to latest version.
>
> I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
> is fixed in 5.4.
>
> I'm using standard faceting without docValues. Should I add docValues in
> order to benefit of such fix?
>
> Best regards,
> Vincenzo
>
>
>
> On Thu, Oct 8, 2015 at 2:22 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com
> > wrote:
>
> > Uwe, it's good to know! I mean that you've recovered. Take care!
> >
> > On Thu, Oct 8, 2015 at 1:24 PM, Uwe Reh 
> > wrote:
> >
> > > Sorry for the delay. I had an ugly flu.
> > >
> > > SOLR-7730 seems to work fine. Using docValues with Solr
> > > 5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again.
> > > (90ms vs. 2ms) :-)
> > >
> > > Thanks
> > > Uwe
> > >
> > >
> > >
> > >
> > > Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:
> > >
> > >> On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh 
> > >> wrote:
> > >>
> > >> When 5.4 with SOLR-7730 will be released, I will start to use
> docValues.
> > >>> Going this way, seems more straight forward to me.
> > >>>
> > >>
> > >>
> > >> Sure. Giving your answers docValues facets has a really good chance to
> > >> perform in your index after SOLR-7730. It's really interesting to see
> > >> performance numbers on early 5.4 builds:
> > >>
> > >>
> >
> https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
> > >>
> > >>
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
> >
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: warning while indexing

2015-12-16 Thread Alexandre Rafalovitch
How many? On the same node?

I am not sure if running multiple DIH is a popular case.

My theory, still, that you are running out of a pool size there. Though if
it happens with even just two DIH, it could be a different issue.
On 17 Dec 2015 12:01 pm, "Midas A"  wrote:

> Alexandre ,
>
> we are running multiple  DIH to index data.
>
> On Thu, Dec 17, 2015 at 12:40 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> wrote:
>
> > Are you sending documents from one client or many?
> >
> > Looks like an exhaustion of some sort of pool related to Commit within,
> > which I assume you are using.
> >
> > Regards,
> > Alex
> > On 16 Dec 2015 4:11 pm, "Midas A"  wrote:
> >
> > > Getting following warning while indexing ..Anybody please tell me the
> > > reason .
> > >
> > >
> > > java.util.concurrent.RejectedExecutionException: Task
> > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@9916a67
> > > rejected from java.util.concurrent.ScheduledThreadPoolExecutor@79f8b5f
> > > [Terminated,
> > > pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
> > > 2046]
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
> > > at
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
> > > at
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
> > > at
> > >
> >
> org.apache.solr.update.CommitTracker._scheduleCommitWithin(CommitTracker.java:150)
> > > at
> > >
> >
> org.apache.solr.update.CommitTracker._scheduleCommitWithinIfNeeded(CommitTracker.java:118)
> > > at
> > >
> >
> org.apache.solr.update.CommitTracker.addedDocument(CommitTracker.java:169)
> > > at
> > >
> >
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:231)
> > > at
> > >
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
> > > at
> > >
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
> > > at
> > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
> > > at
> > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
> > > at
> > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
> > > at
> > >
> >
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
> > > at
> > >
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
> > > at
> > >
> >
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
> > > at
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:500)
> > > at
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
> > > at
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:353)
> > > at
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:219)
> > > at
> > >
> >
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:451)
> > > at
> > >
> >
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:489)
> > > at
> > >
> >
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
> > >
> >
>


Re: warning while indexing

2015-12-16 Thread Midas A
Alexandre,

*Only two DIH, indexing different data.  *

On Thu, Dec 17, 2015 at 10:46 AM, Alexandre Rafalovitch 
wrote:

> How many? On the same node?
>
> I am not sure if running multiple DIH is a popular case.
>
> My theory, still, that you are running out of a pool size there. Though if
> it happens with even just two DIH, it could be a different issue.
> On 17 Dec 2015 12:01 pm, "Midas A"  wrote:
>
> > Alexandre ,
> >
> > we are running multiple  DIH to index data.
> >
> > On Thu, Dec 17, 2015 at 12:40 AM, Alexandre Rafalovitch <
> > arafa...@gmail.com>
> > wrote:
> >
> > > Are you sending documents from one client or many?
> > >
> > > Looks like an exhaustion of some sort of pool related to Commit within,
> > > which I assume you are using.
> > >
> > > Regards,
> > > Alex
> > > On 16 Dec 2015 4:11 pm, "Midas A"  wrote:
> > >
> > > > Getting following warning while indexing ..Anybody please tell me the
> > > > reason .
> > > >
> > > >
> > > > java.util.concurrent.RejectedExecutionException: Task
> > > >
> > > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@9916a67
> > > > rejected from
> java.util.concurrent.ScheduledThreadPoolExecutor@79f8b5f
> > > > [Terminated,
> > > > pool size = 0, active threads = 0, queued tasks = 0, completed tasks
> =
> > > > 2046]
> > > > at
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
> > > > at
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
> > > > at
> > > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
> > > > at
> > > >
> > >
> >
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
> > > > at
> > > >
> > >
> >
> org.apache.solr.update.CommitTracker._scheduleCommitWithin(CommitTracker.java:150)
> > > > at
> > > >
> > >
> >
> org.apache.solr.update.CommitTracker._scheduleCommitWithinIfNeeded(CommitTracker.java:118)
> > > > at
> > > >
> > >
> >
> org.apache.solr.update.CommitTracker.addedDocument(CommitTracker.java:169)
> > > > at
> > > >
> > >
> >
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:231)
> > > > at
> > > >
> > >
> >
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
> > > > at
> > > >
> > >
> >
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
> > > > at
> > > >
> > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
> > > > at
> > > >
> > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
> > > > at
> > > >
> > >
> >
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
> > > > at
> > > >
> > >
> >
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
> > > > at
> > > >
> > org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
> > > > at
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
> > > > at
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:500)
> > > > at
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
> > > > at
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:353)
> > > > at
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:219)
> > > > at
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:451)
> > > > at
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:489)
> > > > at
> > > >
> > >
> >
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
> > > >
> > >
> >
>


Re: Issues when indexing PDF files

2015-12-16 Thread Zheng Lin Edwin Yeo
I've checked all the files which has problem with the content in the Solr
index using the Tika app. All of them shows the same issues as what I see
in the Solr index.

So does the issues lies with the encoding of the file? Are we able to check
the encoding of the file?


Regards,
Edwin


On 17 December 2015 at 00:33, Zheng Lin Edwin Yeo 
wrote:

> Hi Erik,
>
> I've shared the file on dropbox, which you can access via the link here:
> https://www.dropbox.com/s/rufi9esmnsmzhmw/Desmophen%2B670%2BBAe.pdf?dl=0
>
> This is what I get from the Tika app after dropping the file in.
>
> Content-Length: 75092
> Content-Type: application/pdf
> Type: COSName{Info}
> X-Parsed-By: org.apache.tika.parser.DefaultParser
> X-TIKA:digest:MD5: de67120e29ec7ffa24aec7e17104b6bf
> X-TIKA:digest:SHA256:
> d0f04580d87290c1bc8068f3d5b34d797a0d8ccce2b18f626a37958c439733e7
> access_permission:assemble_document: true
> access_permission:can_modify: true
> access_permission:can_print: true
> access_permission:can_print_degraded: true
> access_permission:extract_content: true
> access_permission:extract_for_accessibility: true
> access_permission:fill_in_form: true
> access_permission:modify_annotations: true
> dc:format: application/pdf; version=1.3
> pdf:PDFVersion: 1.3
> pdf:encrypted: false
> producer: null
> resourceName: Desmophen+670+BAe.pdf
> xmpTPg:NPages: 3
>
>
> Regards,
> Edwin
>
>
> On 17 December 2015 at 00:15, Erik Hatcher  wrote:
>
>> Edwin - Can you share one of those PDF files?
>>
>> Also, drop the file into the Tika app and see what it sees directly - get
>> the tika-app JAR and run that desktop application.
>>
>> Could be an encoding issue?
>>
>> Erik
>>
>> —
>> Erik Hatcher, Senior Solutions Architect
>> http://www.lucidworks.com 
>>
>>
>>
>> > On Dec 16, 2015, at 10:51 AM, Zheng Lin Edwin Yeo 
>> wrote:
>> >
>> > Hi,
>> >
>> > I'm using Solr 5.3.0
>> >
>> > I'm indexing some PDF documents. However, for certain PDF files, there
>> are
>> > chinese text in the documents, but after indexing, what is indexed in
>> the
>> > content is either a series of "??" or an empty content.
>> >
>> > I'm using the post.jar that comes together with Solr.
>> >
>> > What could be the reason that causes this?
>> >
>> > Regards,
>> > Edwin
>>
>>
>


Re: Append fields to a document

2015-12-16 Thread Alexandre Rafalovitch
ExternalFileField might be useful in some situations.

But also, is it possible that your Solr schema configuration is not
best suited for your domain? Is it - for example - possible that the
additional data should be in child records?

Pure guesswork here, not enough information. But, as described, Solr
will not be able to fulfill your needs easily. Something will need to
change.

Regards,
   Alex.


Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 16 December 2015 at 22:09, Jamie Johnson  wrote:
> I have a use case where we only need to append some fields to a document.
> To retrieve the full representation is very expensive but I can easily get
> the deltas.  Is it possible to just add fields to an existing Solr
> document?  I experimented with using overwrite=false, but that resulted in
> two documents with the same uniqueKey in the index (which makes sense).  Is
> there a way to accomplish what I'm looking to do in Solr?  My fields aren't
> all stored and think it will be too expensive for me to make that change.
> Any thoughts would be really appreciated.


Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hi Erick and Teague,


I found that when using the field 'text' it shows the pdf file result
id:pdf1 in this case, like:

http://localhost:8983/solr/techproducts/select?fq=id:pdf1=nietava

but when highlight, using the text field...nothing comes up...

http://localhost:8983/solr/techproducts/select?q=text:nietava=id:pdf1=json=true=true=text=%3Cem%3E=%3C%2Fem%3E

​of even with the option

f.text.hl.snippets=2 under the hl.fl field.


I tried as well with the standard configuration, did it all over, reindexed
a couple times... and still did not work.

Also,

Using the Analysis, it brings below information:

ST
textraw_bytesstartendpositionLengthtypeposition
nietava[6e 69 65 74 61 76 61]0711
SF
textraw_bytesstartendpositionLengthtypeposition
nietava[6e 69 65 74 61 76 61]0711
LCF
textraw_bytesstartendpositionLengthtypeposition
nietava[6e 69 65 74 61 76 61]0711
​

Alphanumeric I think... so, it´s 'string', right? would that be a problem?
Should be some other indication?


Thanks again!


*Evert*

2015-12-16 21:09 GMT-02:00 Erick Erickson :

> I think you're still missing the critical bit. Highlighting is
> completely separate from searching. In other words, you can search on
> one field and highlight another. What field is searched is governed by
> the "qf" parameter when using edismax and by the the "df" parameter
> configured in your request handler in solrconfig.xml. These defaults
> are overridden when you do a "fielded search" like
>
> q=content:nietava
>
> So this: q=content:nietava=true=content
> is searching the "content" field. The word you're looking for isn't in
> the content field so naturally no docs are returned. And no
> highlighting either.
>
> This: q=nietava=true=content
>
> is searching somewhere else, thus getting the hit. We already know
> that "nietava" is not in the content field because the first search
> failed. You need to find out what field is being matched (probably
> something like "text") and then try highlighting on _that_ field. Try
> adding "debug=query" to the URL and look at the "parsed_query" section
> of the return and you'll see what field(s) is/are actually being
> searched against.
>
> NOTE: The field you highlight on _must_ have stored="true" in schema.xml.
>
> As to why "nietava" isn't being found in the content field, probably
> you have some kind of analysis chain configured for that field that
> isn't searching as you expect. See the admin/analysis page for some
> insight into why that would be. The most frequent reason is that the
> field is a "string" type which is not broken up into words. Another
> possibility is that your analysis chain is leaving in the quotes or
> something similar. As James says, looking at admin/analysis is a good
> way to figure this out.
>
> I still strongly recommend you go from the stock techproducts example
> and get familiar with how Solr (and highlighting) work before jumping
> in and changing things. There are a number of ways things can be
> mis-configured and trying to change several things at once is a fine
> way to go mad. The admin UI>>schema browser is another way you can see
> what kind of terms are _actually_ in your index in a particular field.
>
> Best,
> Erick
>
>
>
>
> On Wed, Dec 16, 2015 at 12:26 PM, Teague James 
> wrote:
> > Sorry to hear that didn't work! Let me ask a couple of questions...
> >
> > Have you tried the analyzer inside of the Admin Interface? It has helped
> me sort out a number of highlighting issues in the past. To access it, go
> to your Admin interface, select your core, then select Analysis from the
> list of options on the left. In the analyzer, enter the term you are
> indexing in the top left (in other words the term in the document you are
> indexing that you expect to get a hit on) and right input fields. Select
> the field that it is destined for (in your case that would be 'content'),
> then hit analyze. Helps if you have a big screen!
> >
> > This will show you the impact of the various filter factories that you
> have engaged and their effect on whether or not a 'hit' is being generated.
> Hits are idietified by a very feint highlight. (PSST... Developers... It
> would be really cool if the highlight color were more visible or
> customizable... Thanks y'all) If it looks like you're getting hits, but not
> getting highlighting, then open up a new tab with the Admin's query
> interface. Same place on the left as the analyzer. Replace the "*:*" with
> your search term (assuming you already indexed your document) and if
> necessary you can put something in the FQ like "id:123456" to target a
> specific record.
> >
> > Did you get a hit? If no, then it's not highlighting that's the issue.
> If yes, then try dumping this in your address bar (using your URL/IP,
> search term, and core name of course. The fq= is an example) :
> > http://[URL/IP]/solr/[CORE-NAME]/select?fq=id:123456="[SEARCH-TERM];
> >
> > That will dump Solr's output to your 

Re: Strange debug output for a slow query

2015-12-16 Thread Erick Erickson
Hmmm, take a look at the individual queries on a shard, i.e. peek at
the Solr logs and see if the fq clause comes through cleanly when you
see =false. I suspect this is just a glitch in assembling the
debug response. If it is, it probably deserves a JIRA. In fact it
deserves a JIRA in either case I think.

I don't see anything obvious, but your statement "when the caches are
cold" points to autowarming as your culprit. What to you have set up
for autowarming in your caches? And do you have any newSearcher or
firstSearcher events defined?

Best,
Erick

On Wed, Dec 16, 2015 at 3:50 PM, Shawn Heisey  wrote:
> Here is the query URL that I did.  The info included in this message is
> slightly redacted.
>
> http://bigindy5.REDACTED.com:8982/solr/sparkmain/search?q=%28german+shepherd%29=/search=0=NOT%28feature:redact1+OR+feature:spkhistorical%29=%28ip:%28AP%29+AND+price:0%29+OR+%28ip:%28BB%29%29+OR+%28ip:%28COR%29+AND+price:0%29+OR+%28ip:%28GET%29+AND+%28collection:subscription+OR+collection:editorialsubscription%29%29+OR+%28ip:%28PA%29%29+OR+%28ip:%28RTR%29%29+OR+%28ip:%28RX%29+AND+price:0%29+OR+%28ip:%28USAT%29+AND+price:0%29+OR+%28ip:%28AFP%29%29+OR+%28ip:%28GET%29+AND+NOT+collection:subscription+AND+NOT+collection:editorialsubscription%29+OR+%28ip:%28RX%29%29=restr:%28%28worldwide+OR+none+OR+aus_i%29+AND+NOT+%28aus_x%29%29=doc_date:[1900-01-01T00:00:00Z+TO+2015-12-12T00:00:00Z]=post_date+desc=75=true=true=json
>
> Sent to a set of production servers running 4.9.1 (when the caches are
> cold), this takes about 7 seconds.  Sent to a 5.3.2-SNAPSHOT dev server
> with cold caches, it takes about 15 seconds -- because that server is
> particularly low on memory.  Once the query is cached, it takes 100
> milliseconds or less, even on the dev server.
>
> Checking one of the shard indexes with the schema browser, ip has 34
> unique terms, feature has 108 unique terms, collection has 824 unique
> terms, restr has 128 unique terms.  As expected, doc_date has about 12
> million unique terms for the shard.  It is a TrieDateField with a
> precisionStep of 16.  The rest of the shards have similar unique term
> counts.  The entire sharded index has 244 million documents in it - six
> shards with 40.6 million each and one shard with under 500K documents.
>
> I've been trying to figure out why this query is so slow.  I can't see
> anything obvious, but I did encounter something really weird in the
> debug output.
>
> This is the params section of the response -- as you can see, echoParams
> is set to all, and you can see the shards parameter defined in
> solrconfig.xml:
>
> http://apaste.info/pGe
>
> This is the filter query info from the debug -- showing the same set of
> filters seven times, which I assume is because there are seven shards.
> I do not know if this is a debug glitch.  The response info here is from
> the dev server, but the production servers give the same info:
>
> http://apaste.info/HpC
>
> Does anyone have thoughts about the repeated filter information in the
> debug output, or why it takes several seconds for this query to run?
>
> General performance on the production index is pretty good.  Over the
> last 9800 queries, the production server has a median qtime of 238ms and
> a 95th percentile of 3672ms.  The query rate is less than one per second.
>
> Thanks,
> Shawn
>


Re: Where/howto store store.xml in Zookeeper?

2015-12-16 Thread Shawn Heisey
On 12/16/2015 5:18 AM, Andrej van der Zee wrote:
> I have tried several variations to upload solr.xml to Zookeeper like these:
>
>  /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -confdir
> /etc/zookeeper/solr.xml -confname solr -z 1.2.3.4:2181
>
> But somehow the Solr instances cant find it.

The "upconfig" command uploads an entire collection configuration to
zookeeper under the /configs location in the ZK database.  It doesn't
handle solr.xml or anything else at the root of the database.

This is what you need:

https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files#UsingZooKeepertoManageConfigurationFiles-PreparingZooKeeperbeforefirstclusterstart

Thanks,
Shawn



Re: Append fields to a document

2015-12-16 Thread Jamie Johnson
The expense is in gathering the pieces to do the indexing.  There isn't
much that I can do in that regard unfortunately.  I need to investigate
storing the fields, if they aren't returned is the expense just size on
disk or is there a memory cost as well?
On Dec 16, 2015 7:43 PM, "Alexandre Rafalovitch"  wrote:

> ExternalFileField might be useful in some situations.
>
> But also, is it possible that your Solr schema configuration is not
> best suited for your domain? Is it - for example - possible that the
> additional data should be in child records?
>
> Pure guesswork here, not enough information. But, as described, Solr
> will not be able to fulfill your needs easily. Something will need to
> change.
>
> Regards,
>Alex.
>
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 16 December 2015 at 22:09, Jamie Johnson  wrote:
> > I have a use case where we only need to append some fields to a document.
> > To retrieve the full representation is very expensive but I can easily
> get
> > the deltas.  Is it possible to just add fields to an existing Solr
> > document?  I experimented with using overwrite=false, but that resulted
> in
> > two documents with the same uniqueKey in the index (which makes sense).
> Is
> > there a way to accomplish what I'm looking to do in Solr?  My fields
> aren't
> > all stored and think it will be too expensive for me to make that change.
> > Any thoughts would be really appreciated.
>


Re: solr cloud invalid shard/collection configuration

2015-12-16 Thread ig01
Can someone please advise considering my previous answer?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-cloud-invalid-shard-collection-configuration-tp4245151p4245986.html
Sent from the Solr - User mailing list archive at Nabble.com.


Trying to index document in Solr with solr-spark library

2015-12-16 Thread Guillermo Ortiz
I'm getting some errors when I try to use the solr-sparl library getting
the error *KeeperErrorCode = NoNode for /live_nodes*.

I download the library and compile with the branch_4.x since I'm using
Cloudera 5.5.1 and Solr 4.10.3.

I checked the logs of Solr and Zookeeper and I didn't find any error and
navigate inside Zookeeper and the collection is created. These errors
happen in the executors of Spark.


2015-12-16 16:31:43,923 [Executor task launch worker-1] INFO
org.apache.zookeeper.ZooKeeper - Session: 0x1519126c7d55b23 closed

2015-12-16 16:31:43,924 [Executor task launch worker-1] ERROR org.apache.
spark.executor.Executor - Exception in task 5.2 in stage 12.0 (TID 218)
org.apache.solr.common.cloud.ZooKeeperException:
at org.apache.solr
.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:252)
at com.lucidworks.spark.SolrSupport.getSolrServer(SolrSupport.java:67)
at com.lucidworks.spark.SolrSupport$4.call(SolrSupport.java:162)
at com.lucidworks.spark.SolrSupport$4.call(SolrSupport.java:160)
at org.apache.spark
.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222)
at org.apache.spark
.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222)
at org.apache.spark
.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:898)
at org.apache.spark
.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:898)
at org.apache.spark
.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark
.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
*Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /live_nodes*
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
at org.apache.solr
.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:290)
at org.apache.solr
.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:287)
at org.apache.solr
.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
at org.apache.solr
.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:287)
at org.apache.solr
.common.cloud.ZkStateReader.createClusterStateWatchersAndUpdate(ZkStateReader.java:334)
at org.apache.solr
.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:243)


Re: query to get parents without childs

2015-12-16 Thread Binoy Dalal
You could try simply doing a not query to find all those docs that do not
contain the child fields like -fq=:*
Since the index is flat, the "children" are like any other fields to lucene
and so this should work

On Thu, 17 Dec 2015, 04:33 Novin  wrote:

> "Index the number of children into the parent as an integer" is nice and
> easy solution. But I would like to know about"
>
> You could probably do that inside an UpdateProcessor, even using the
> Javascript ScriptUpdateProcessor. Probably simpler though in the code
> that pushes the docs to Solr." either.
> Can you point me to any documentation related to above, would be very
> helpful.
>
> Thanks
>
>
> On 16/12/2015 21:52, Upayavira wrote:
> > So that's a good question - how do you identify parent documents that
> > *do not* have child documents.
> >
> > I'm not sure how you would do that. However, you could index the number
> > of children into the parent as an integer, then it would be easy.
> >
> > You could probably do that inside an UpdateProcessor, even using the
> > Javascript ScriptUpdateProcessor. Probably simpler though in the code
> > that pushes the docs to Solr.
> >
> > Upayavira
> >
> > On Wed, Dec 16, 2015, at 09:05 PM, Novin Novin wrote:
> >> Hi Scott,
> >>
> >> Actually, it is not multi value field. it is nested document.
> >>
> >> Novin
> >>
> >> On 16 December 2015 at 20:33, Scott Stults <
> >> sstu...@opensourceconnections.com> wrote:
> >>
> >>> Hi Novin,
> >>>
> >>> How are you associating parents with children? Is it a "children"
> >>> multivalued field in the parent record? If so you could query for
> records
> >>> that don't have a value in that field like "-children:[* TO *]"
> >>>
> >>> k/r,
> >>> Scott
> >>>
> >>> On Wed, Dec 16, 2015 at 7:29 AM, Novin Novin 
> wrote:
> >>>
>  Hi guys,
> 
>  I have few parent index without child, what would wold be the query
> for
>  those to get?
> 
>  Thanks,
>  Novin
> 
> >>>
> >>>
> >>> --
> >>> Scott Stults | Founder & Solutions Architect | OpenSource Connections,
> LLC
> >>> | 434.409.2780
> >>> http://www.opensourceconnections.com
> >>>
>
> --
Regards,
Binoy Dalal


Re: Issues when indexing PDF files

2015-12-16 Thread Alexandre Rafalovitch
They could be using custom fonts and non-Unicode characters. That's
probably something to explore with PDF specific tools.
On 17 Dec 2015 1:37 pm, "Zheng Lin Edwin Yeo"  wrote:

> I've checked all the files which has problem with the content in the Solr
> index using the Tika app. All of them shows the same issues as what I see
> in the Solr index.
>
> So does the issues lies with the encoding of the file? Are we able to check
> the encoding of the file?
>
>
> Regards,
> Edwin
>
>
> On 17 December 2015 at 00:33, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi Erik,
> >
> > I've shared the file on dropbox, which you can access via the link here:
> > https://www.dropbox.com/s/rufi9esmnsmzhmw/Desmophen%2B670%2BBAe.pdf?dl=0
> >
> > This is what I get from the Tika app after dropping the file in.
> >
> > Content-Length: 75092
> > Content-Type: application/pdf
> > Type: COSName{Info}
> > X-Parsed-By: org.apache.tika.parser.DefaultParser
> > X-TIKA:digest:MD5: de67120e29ec7ffa24aec7e17104b6bf
> > X-TIKA:digest:SHA256:
> > d0f04580d87290c1bc8068f3d5b34d797a0d8ccce2b18f626a37958c439733e7
> > access_permission:assemble_document: true
> > access_permission:can_modify: true
> > access_permission:can_print: true
> > access_permission:can_print_degraded: true
> > access_permission:extract_content: true
> > access_permission:extract_for_accessibility: true
> > access_permission:fill_in_form: true
> > access_permission:modify_annotations: true
> > dc:format: application/pdf; version=1.3
> > pdf:PDFVersion: 1.3
> > pdf:encrypted: false
> > producer: null
> > resourceName: Desmophen+670+BAe.pdf
> > xmpTPg:NPages: 3
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 17 December 2015 at 00:15, Erik Hatcher 
> wrote:
> >
> >> Edwin - Can you share one of those PDF files?
> >>
> >> Also, drop the file into the Tika app and see what it sees directly -
> get
> >> the tika-app JAR and run that desktop application.
> >>
> >> Could be an encoding issue?
> >>
> >> Erik
> >>
> >> —
> >> Erik Hatcher, Senior Solutions Architect
> >> http://www.lucidworks.com 
> >>
> >>
> >>
> >> > On Dec 16, 2015, at 10:51 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> > I'm using Solr 5.3.0
> >> >
> >> > I'm indexing some PDF documents. However, for certain PDF files, there
> >> are
> >> > chinese text in the documents, but after indexing, what is indexed in
> >> the
> >> > content is either a series of "??" or an empty content.
> >> >
> >> > I'm using the post.jar that comes together with Solr.
> >> >
> >> > What could be the reason that causes this?
> >> >
> >> > Regards,
> >> > Edwin
> >>
> >>
> >
>


query to get parents without childs

2015-12-16 Thread Novin Novin
Hi guys,

I have few parent index without child, what would wold be the query for
those to get?

Thanks,
Novin


Re: integrate solr with preprocessor tools

2015-12-16 Thread Emir Arnautovic

Hi Sara,
I would recommend looking at code of some component that you use 
currently and start from that - you can extend that class or use it as 
template for your own.


Thanks,
Emir

On 16.12.2015 09:58, sara hajili wrote:

hi Emir,tnx for answering
now my question is how i write this class?
i must use solr interfaces?
i see in above link that i can use solr analyzer.but how i use that?
plz say me how i start to write my own analyzer step by step...
which interface i can use and change to achieve my goal?
tnx

On Wed, Dec 9, 2015 at 1:50 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:


Hi Sara,
You need to wrap your code in tokenizer or token filter
https://wiki.apache.org/solr/SolrPlugins

If you want to improve existing and believe others can benefit from
improvement, you can open ticket and submit patch.

Thanks,
Emir


On 09.12.2015 10:41, sara hajili wrote:


hi i wanna to use solr , and language of my documents that i stored in
solr
is persian.
solr doesn't support persian as well as i want.so i find preprocessor
tools
like a normalization,tockenizer and etc ...
i don't want to use solr persian filter like persian tockenizer,i mean i
wanna to improve it.

now my question is how i can integrate solr with this external
preprocessor
tools??

tnx



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hi everyone!

I think I should not have posted my server name... never had that many
access attempts...



2015-12-16 9:03 GMT-02:00 Evert R. :

> Hello Erick,
>
> Thanks again for your time.
>
> Here is as far as I have gone:
>
> 1. I started a fresh install and did the following:
>
> [evert@nix]$ bin/solr start -e techproducts
> [evert@nix]$ curl '
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1=true'
> -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
>
> 2. I am using only the Solr Admin UI to check the query respond, here is
> an example:
>
> Query: http://
> ​localhost
>
> :8983/solr/techproducts/select?q=nietava=id%2C+author%2C+content=json=true=true=%3Cem%3E=%3C%2Fem%3E
>
> Result: {
>   "responseHeader": {
> "status": 0,
> "QTime": 14,
> "params": {
>   "q": "nietava",
>   "hl": "true",
>   "hl.simple.post": "",
>   "indent": "true",
>   "fl": "id, author, content",
>   "wt": "json",
>   "hl.simple.pre": "",
>   "_": "1450262674102"
> }
>   },
>   "response": {
> "numFound": 1,
> "start": 0,
> "docs": [
>   {
> "id": "pdf1",
> "author": "Wander",
> "content": [
>   "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n\n
> Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
> Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo Espiritual” \n
> \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n \n \n
> \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua Souza
> Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
> Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n Coleção
> \n“A Vida no Mundo Espiritual” \n"
> ]
>   }
> ]
>   },
>   "highlighting": {
> "pdf1": {}
>   }
> }
>
> **On the content it brings the whole pdf content (book), and notice that
> in the highlight it shows empty.
>
> I tried creating a new core with bin/solr create -c test, using the
> schema.xml and solrconfig.xml standard found in
> /solr/server/solr/configsets/basic_configs/conf
>
> But even though... not working as expected (I think).
>
>
> Would you know how to set this techproducts example to bring the snnipets
> of text?
>
> The server only allows specific ip address for this port, if you would, I
> could get it open for you to check.
>
>
> Thanks again and best regards!
>
>
>
>
> *Evert Ramos*
> *evert.ra...@gmail.com *
>
>
> 2015-12-15 18:14 GMT-02:00 Erick Erickson :
>
>> No, that's not what I meant. The highlight component adds a special
>> section to the return packet that will contain "snippets" of text with
>> highlights. You control how big those snippets are via various
>> parameters in the highlight component and they'll have the tags you
>> specify for highlighting.
>>
>> Your app needs to pull the information from the highlight portion of
>> the response packet rather than the document list. Just execute your
>> queries via cURL or a browser to see the structure of a response to
>> see what I mean.
>>
>> And note that you do _not_ need to return the fields you're
>> highlighting in the "fl" list so you do _not_ need to return the
>> entire document contents.
>>
>> What are you using to display the results anyway?
>>
>> Best,
>> Erick
>>
>> On Tue, Dec 15, 2015 at 10:02 AM, Evert R.  wrote:
>> > Hi Erick,
>> >
>> > Thank you very much for the reply!!
>> >
>> > I do get back the full text, autor, and a whole lots of stuff which
>> doesn´t
>> > really matter for my project.
>> >
>> > So, what you are saying is that the solr gets me back the full content
>> and
>> > my application will fix the rest? Which means for me that all my books
>> (pdf
>> > files) when searching for an specific word it will bring me the whole
>> book
>> > content that has the requested query. And my application (php) in this
>> > case... will take care of show only part of the text (such as in
>> highlight,
>> > as I was understandind) and hightlight the key word I was looking for?
>> >
>> > If so, Erick, you gave me a big help clearing out... I thought I would
>> do
>> > that with Solr in an easy way. =)
>> >
>> > Thanks for the attachements tip!
>> >
>> > Best regards,
>> >
>> > Evert
>> >
>> > 2015-12-15 14:56 GMT-02:00 Erick Erickson :
>> >
>> >> How are you trying to display the results? Highlighting is a bit of an
>> >> odd beast. Assuming it's correctly configured, the response packet
>> >> will have a separate highlight section, it's the application's
>> >> responsibility to present that pleasingly.
>> >>
>> >> What _do_ you get bak in the response?
>> >>
>> >> BTW, the mail sever pretty aggressively strips attachments, your's
>> >> didn't come through.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Tue, Dec 15, 2015 at 

Where/howto store store.xml in Zookeeper?

2015-12-16 Thread Andrej van der Zee
Hi,

When I start a Solr cloud instance, I keep getting this in the log:

800  INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client is connected to
ZooKeeper
800  INFO  (main) [   ] o.a.s.c.c.SolrZkClient Using default ZkACLProvider
805  INFO  (main) [   ] o.a.s.s.SolrDispatchFilter Loading solr.xml from
SolrHome (not found in ZooKeeper)
807  INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading container
configuration from /opt/solr/server/solr/solr.xml
852  INFO  (main) [   ] o.a.s.c.CoresLocator Config-defined core root
directory: /opt/solr/server/solr

I have tried several variations to upload solr.xml to Zookeeper like these:

 /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -confdir
/etc/zookeeper/solr.xml -confname solr -z 1.2.3.4:2181

But somehow the Solr instances cant find it.

Thanks,
Andrej


Permutations of entries in a multivalued field

2015-12-16 Thread Johannes Riedl

Hello all,

we are facing the following problem: we use a multivalued string field 
that contains entries of the kind A/B/C/, where A,B,C are terms.
We are now looking for a simple way to also find all permutations of 
A/B/C, so e.g. B/A/C. As a workaround we added a new field that contains 
all entries alphabetically sorted and guarantee sorting on the user 
side. However - since this is limited in some ways - is there a simple 
way to either index in a way such that solely A/B/C and all permutations 
are found (using e.g. type=text is not an option since a term could 
occur in a different entry of the multivalued field) or trigger an 
alphabetical sorting of incoming queries.


Thanks a lot for your feedback, best regards

Johannes



Re: minimum should match, cant explain the amount of hits

2015-12-16 Thread Binoy Dalal
The edismax documentation confirms that when a positive % value is
provided, solr will round down. If you want solr to round up set your
parameter value as '-35%'

On Wed, 16 Dec 2015, 17:28 Binoy Dalal  wrote:

> My guess is that solr is rounding down while calculating number of
> mandatory terms.
> In your case, there are 3 terms, 65% of which is 1.95 which rounded down
> is 1, but 67% is 2.01 which rounded down is 2 which conforms with the
> results you're seeing.
>
> Maybe someone else can confirm this.
>
> On Wed, 16 Dec 2015, 16:56 Ron van der Vegt 
> wrote:
>
>> Hi,
>>
>> I'm currently searching with the following query: q="sony+led+tv".
>> The minimum should match setting is set on: mm=2<65%.
>> So when there are more then two terms, at least 65% of the terms should
>> match.
>> I'm not using the StopFilterFactory.
>>
>> When turning on debug, this is the parsedquery_toString:
>>
>> +(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0
>> | breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 |
>> salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15
>> (categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 |
>> breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led |
>> brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 |
>> name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 |
>> text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 |
>> salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15
>>
>> While I except that at least two terms should match, because of the 65%,
>> i'm also getting hits of documents which seems to match on only one of
>> the terms. Below the explain of the hit, which shouldn't be there:
>>
>> 2.6449876 = sum of:
>>2.6449876 = sum of:
>>  2.6449876 = max plus 0.15 times others of:
>>2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
>>  2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
>> ), product of:
>>2.6449876 = idf(docFreq=3254, maxDocs=45833)
>>1.0 = tfNorm, computed from:
>>  1.0 = termFreq=1.0
>>  1.0 = parameter k1
>>  0.0 = parameter b (norms omitted for field)
>>
>> When I change the mm to 2<67% then I get the amount of results what I
>> expect with 65%, but If I understand correctly then all the terms should
>> match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss
>> something, or is there something else what could effect the minimum
>> should match setting?
>>
>> Thanks in advice!
>>
>> Ron
>>
> --
> Regards,
> Binoy Dalal
>
-- 
Regards,
Binoy Dalal


Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Andrea Gazzarini
hl=f.content.hl.content (I guess) is definitely wrong. Some questions:

   - First, sorry, the obvious question: are you sure the documents contain
   the "nietava" term?
   - Could you try to use q=content:nietaval?
   - Could you paste the definition (field & fieldtype) of the content
   field?

> Should I have this configuration in the XML file?

You could, but it's up to you and it strongly depends on your context. The
simple thing is that if you have those parameters within the configuration
you can avoid to pass them (as part of the requests), but probably in this
phase, where you are testing, it's better to have them there (in the
request).

Andrea

2015-12-16 15:28 GMT+01:00 Evert R. :

> Hi Andrea,
>
> Thanks for the reply!
>
> I tried with the hl.fl parameter as well, using as below:
>
>
> http://localhost:8983/solr/techproducts/select?q=nietava=id%2C+content=json=true=true;
>
> hl.fl=f.content.hl.content%3D4=%3Cem%3E=%3C%2Fem%3E
>
> with the parameter under the hl field in the solr ui:
>
> 1. f.content.hl.snnipets=2
> 2. f.content.hl.content=4
> 3. content
>
> with no success...
>
> Should I have this configuration in the XML file?
>
> Regards,
>
> *Evert *
>
> 2015-12-16 11:23 GMT-02:00 Andrea Gazzarini :
>
> > Hi Evert,
> > what is the configuration of the default request handler? Did you set the
> > hl.fl parameter?
> >
> > Please check here [1] the parameters that the highlighting component
> > expects. Required parameters should be in the query string or declared
> > within the request handler which answers to your query.
> >
> > Andrea
> >
> > [1] https://wiki.apache.org/solr/HighlightingParameters
> >
> >
> >
> >
> > 2015-12-16 12:51 GMT+01:00 Evert R. :
> >
> > > Hi everyone!
> > >
> > > I think I should not have posted my server name... never had that many
> > > access attempts...
> > >
> > >
> > >
> > > 2015-12-16 9:03 GMT-02:00 Evert R. :
> > >
> > > > Hello Erick,
> > > >
> > > > Thanks again for your time.
> > > >
> > > > Here is as far as I have gone:
> > > >
> > > > 1. I started a fresh install and did the following:
> > > >
> > > > [evert@nix]$ bin/solr start -e techproducts
> > > > [evert@nix]$ curl '
> > > >
> > >
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1=true
> > > '
> > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> > > >
> > > > 2. I am using only the Solr Admin UI to check the query respond, here
> > is
> > > > an example:
> > > >
> > > > Query: http://
> > > > ​localhost
> > > >
> > > >
> > >
> >
> :8983/solr/techproducts/select?q=nietava=id%2C+author%2C+content=json=true=true=%3Cem%3E=%3C%2Fem%3E
> > > >
> > > > Result: {
> > > >   "responseHeader": {
> > > > "status": 0,
> > > > "QTime": 14,
> > > > "params": {
> > > >   "q": "nietava",
> > > >   "hl": "true",
> > > >   "hl.simple.post": "",
> > > >   "indent": "true",
> > > >   "fl": "id, author, content",
> > > >   "wt": "json",
> > > >   "hl.simple.pre": "",
> > > >   "_": "1450262674102"
> > > > }
> > > >   },
> > > >   "response": {
> > > > "numFound": 1,
> > > > "start": 0,
> > > > "docs": [
> > > >   {
> > > > "id": "pdf1",
> > > > "author": "Wander",
> > > > "content": [
> > > >   "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n
> \n
> > > > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
> > > > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo
> Espiritual”
> > > \n
> > > > \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n
> \n
> > > \n
> > > > \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua
> > Souza
> > > > Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> > > > www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
> > > > Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n
> > Coleção
> > > > \n“A Vida no Mundo Espiritual” \n"
> > > > ]
> > > >   }
> > > > ]
> > > >   },
> > > >   "highlighting": {
> > > > "pdf1": {}
> > > >   }
> > > > }
> > > >
> > > > **On the content it brings the whole pdf content (book), and notice
> > that
> > > > in the highlight it shows empty.
> > > >
> > > > I tried creating a new core with bin/solr create -c test, using the
> > > > schema.xml and solrconfig.xml standard found in
> > > > /solr/server/solr/configsets/basic_configs/conf
> > > >
> > > > But even though... not working as expected (I think).
> > > >
> > > >
> > > > Would you know how to set this techproducts example to bring the
> > snnipets
> > > > of text?
> > > >
> > > > The server only allows specific ip address for this port, if you
> > would, I
> > > > could get it open for you to check.
> > > >
> > > >
> > > > Thanks again and best regards!
> > > >
> > > >
> > > >
> > > >
> > > > *Evert
> > > >
> > > >
> > > > 2015-12-15 18:14 GMT-02:00 Erick 

Re: Collection API migrate statement

2015-12-16 Thread philippa griggs
Hello,

Thanks for your reply.  

As you suggested, I've tried running the operation along with the async command 
and it works- thank you. My next question is: Is there any way of finding out 
more information on the completed task? As I'm currently testing the new solr 
configuration, it would be handy to know the runtime of the operation.

Many thanks

Philippa


From: Shalin Shekhar Mangar 
Sent: 15 December 2015 19:05
To: solr-user@lucene.apache.org
Subject: Re: Collection API migrate statement

The migrate is a long running operation. Please use it along with
async= parameter so that it can execute in
the background. Then you can use the request status API to poll and
wait until the operation completes. If there is any error then the
same request status API will return the response. See
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RequestStatus

On Tue, Dec 15, 2015 at 9:27 PM, philippa griggs
 wrote:
> Hello,
>
>
> Solr 5.2.1.
>
>
> I'm using the collection API migrate statement in our test environment with 
> the view to implement a Hot, Cold arrangement- newer documents will be kept 
> on the Hot collection and each night the oldest documents will be migrated 
> into the Cold collection. I've got it all working with a small amount of 
> documents (around 28,000).
>
>
> I'm now trying to migrate around 200,000 documents and am getting 'migrate 
> the collection time out:180s'  message back.
>
>
> The logs from the source collection are:
>
>
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] 
> org.apache.solr.cloud.OverseerCollectionProcessor; Successfully created 
> replica of temp source collection on target leader node
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] 
> org.apache.solr.cloud.OverseerCollectionProcessor; Requesting merge of temp 
> source collection replica to target leader
> INFO  - 2015-12-15 14:45:36.648; [   ] 
> org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeDeleted fired on 
> path /overseer/collection-queue-work/qnr-04 state SyncConnected
> INFO  - 2015-12-15 14:45:36.651; [   ] 
> org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged 
> fired on path /overseer/collection-queue-work state SyncConnected
> ERROR - 2015-12-15 14:45:36.651; [   ] org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: migrate the collection time out:180s
> at 
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)
> etc
>
>
> The logs from the target collection are:
>
> INFO  - 2015-12-15 14:43:19.128; [split_shard1_temp_shard2 shard1  
> split_shard1_temp_shard2_shard1_replica2] org.apache.solr.update.UpdateLog; 
> Took 22 ms to seed version buckets with highest version 1520634636692094979
> INFO  - 2015-12-15 14:43:19.129; [split_shard1_temp_shard2 shard1  
> split_shard1_temp_shard2_shard1_replica2] 
> org.apache.solr.cloud.RecoveryStrategy; Finished recovery process. 
> core=split_shard1_temp_shard2_shard1_replica2
> INFO  - 2015-12-15 14:43:19.199; [   ] 
> org.apache.solr.update.DirectUpdateHandler2; start mergeIndexes{}
>
> As there are no errors in the target collection, am I right in assuming the 
> timeout occured because the merge took too long? If that is so, how to I 
> increase the timeout period? Ideally I will need to migrate around 2 million 
> documents a night.
>
>
> Any help would be much appreciated.
>
>
> Philippa
>
>



--
Regards,
Shalin Shekhar Mangar.


Ugh! My term is the entire record

2015-12-16 Thread Mark Fenbers

Greetings,

I had my Solr searching capabilities working for a while.  But today I 
inadvertently "unload"d my core from the Admin Interface. After adding 
it back in, it is not working right. Because Solr was down for a while 
in recent weeks, I have also done a full import with the clean option.  
So now, searching on words like Ohio or forecast (both very popular 
words in the documents) return 0 results.


In Schema Browser, "Show Term Info" now reveals that my terms are the 
*entire* text string record instead of individual words.  I had come 
across this issue before, during initially setting up Solr, but now I 
can't remember what I had done to get it to index each *word* instead of 
the entire String stored in the DB record.


Can someone please point me to the trick that does the proper parsing 
and indexing of *each word* in each record?


thanks!
Mark


Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hi Andrea,

Thanks for the reply!

I tried with the hl.fl parameter as well, using as below:

http://localhost:8983/solr/techproducts/select?q=nietava=id%2C+content=json=true=true;
hl.fl=f.content.hl.content%3D4=%3Cem%3E=%3C%2Fem%3E

with the parameter under the hl field in the solr ui:

1. f.content.hl.snnipets=2
2. f.content.hl.content=4
3. content

with no success...

Should I have this configuration in the XML file?

Regards,

*Evert *

2015-12-16 11:23 GMT-02:00 Andrea Gazzarini :

> Hi Evert,
> what is the configuration of the default request handler? Did you set the
> hl.fl parameter?
>
> Please check here [1] the parameters that the highlighting component
> expects. Required parameters should be in the query string or declared
> within the request handler which answers to your query.
>
> Andrea
>
> [1] https://wiki.apache.org/solr/HighlightingParameters
>
>
>
>
> 2015-12-16 12:51 GMT+01:00 Evert R. :
>
> > Hi everyone!
> >
> > I think I should not have posted my server name... never had that many
> > access attempts...
> >
> >
> >
> > 2015-12-16 9:03 GMT-02:00 Evert R. :
> >
> > > Hello Erick,
> > >
> > > Thanks again for your time.
> > >
> > > Here is as far as I have gone:
> > >
> > > 1. I started a fresh install and did the following:
> > >
> > > [evert@nix]$ bin/solr start -e techproducts
> > > [evert@nix]$ curl '
> > >
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1=true
> > '
> > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> > >
> > > 2. I am using only the Solr Admin UI to check the query respond, here
> is
> > > an example:
> > >
> > > Query: http://
> > > ​localhost
> > >
> > >
> >
> :8983/solr/techproducts/select?q=nietava=id%2C+author%2C+content=json=true=true=%3Cem%3E=%3C%2Fem%3E
> > >
> > > Result: {
> > >   "responseHeader": {
> > > "status": 0,
> > > "QTime": 14,
> > > "params": {
> > >   "q": "nietava",
> > >   "hl": "true",
> > >   "hl.simple.post": "",
> > >   "indent": "true",
> > >   "fl": "id, author, content",
> > >   "wt": "json",
> > >   "hl.simple.pre": "",
> > >   "_": "1450262674102"
> > > }
> > >   },
> > >   "response": {
> > > "numFound": 1,
> > > "start": 0,
> > > "docs": [
> > >   {
> > > "id": "pdf1",
> > > "author": "Wander",
> > > "content": [
> > >   "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n\n
> > > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
> > > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo Espiritual”
> > \n
> > > \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n \n
> > \n
> > > \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua
> Souza
> > > Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> > > www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
> > > Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n
> Coleção
> > > \n“A Vida no Mundo Espiritual” \n"
> > > ]
> > >   }
> > > ]
> > >   },
> > >   "highlighting": {
> > > "pdf1": {}
> > >   }
> > > }
> > >
> > > **On the content it brings the whole pdf content (book), and notice
> that
> > > in the highlight it shows empty.
> > >
> > > I tried creating a new core with bin/solr create -c test, using the
> > > schema.xml and solrconfig.xml standard found in
> > > /solr/server/solr/configsets/basic_configs/conf
> > >
> > > But even though... not working as expected (I think).
> > >
> > >
> > > Would you know how to set this techproducts example to bring the
> snnipets
> > > of text?
> > >
> > > The server only allows specific ip address for this port, if you
> would, I
> > > could get it open for you to check.
> > >
> > >
> > > Thanks again and best regards!
> > >
> > >
> > >
> > >
> > > *Evert
> > >
> > >
> > > 2015-12-15 18:14 GMT-02:00 Erick Erickson :
> > >
> > >> No, that's not what I meant. The highlight component adds a special
> > >> section to the return packet that will contain "snippets" of text with
> > >> highlights. You control how big those snippets are via various
> > >> parameters in the highlight component and they'll have the tags you
> > >> specify for highlighting.
> > >>
> > >> Your app needs to pull the information from the highlight portion of
> > >> the response packet rather than the document list. Just execute your
> > >> queries via cURL or a browser to see the structure of a response to
> > >> see what I mean.
> > >>
> > >> And note that you do _not_ need to return the fields you're
> > >> highlighting in the "fl" list so you do _not_ need to return the
> > >> entire document contents.
> > >>
> > >> What are you using to display the results anyway?
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Tue, Dec 15, 2015 at 10:02 AM, Evert R. 
> > wrote:
> > >> > Hi Erick,
> > >> >
> > >> > 

Append fields to a document

2015-12-16 Thread Jamie Johnson
I have a use case where we only need to append some fields to a document.
To retrieve the full representation is very expensive but I can easily get
the deltas.  Is it possible to just add fields to an existing Solr
document?  I experimented with using overwrite=false, but that resulted in
two documents with the same uniqueKey in the index (which makes sense).  Is
there a way to accomplish what I'm looking to do in Solr?  My fields aren't
all stored and think it will be too expensive for me to make that change.
Any thoughts would be really appreciated.


Timeouts for create_collection

2015-12-16 Thread Andrej van der Zee
Hi,

I am newby to Solr and I am having difficulties setting up a cluster with a
single Zookeeper instance and two Solr instances. The Solr intances both
successfully establish sessions with the Zookeeper and I am able to upload
collection configs to Zookeeper, but somehow creating a collection from one
of the Solr instances timeouts:

core@ip-172-31-11-63:/opt/solr# ./bin/solr create_collection -c connects
-replicationFactor 2

Connecting to ZooKeeper at 172.31.11.65:2181 ...
Re-using existing configuration directory connects

Creating new collection 'connects' using command:
http://localhost:8984/solr/admin/collections?action=CREATE=connects=1=2=1=connects

ERROR: Failed to create collection 'connects' due to: create the collection
time out:180s


Another thing that got my attention is that the /clusterstate.json is
empty, even after establishing sessions from the Solr instances. I am not
sure if it is related or that the clusterstate is supposed to be empty
until I successfully create a collection?

Thanks,
Andrej


Re: Timeouts for create_collection

2015-12-16 Thread Andrej van der Zee
Hi,

I completely started over again. Now I get the following error upon
create_collection:

solr@ip-172-31-11-63:/opt/solr$ ./bin/solr create_collection -c connects
-replicationFactor 2

Connecting to ZooKeeper at 172.31.11.65:2181 ...
Re-using existing configuration directory connects

Creating new collection 'connects' using command:
http://localhost:8984/solr/admin/collections?action=CREATE=connects=1=2=1=connects

ERROR: Failed to create collection 'connects' due to:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
from server at http://172.31.11.63:8984/solr: Error CREATEing SolrCore
'connects_shard1_replica1': Unable to create core
[connects_shard1_replica1] Caused by: Can't find resource 'solrconfig.xml'
in classpath or '/configs/connects', cwd=/opt/solr/server


When I look in Zookeeker, I can see that solrconfig.xml is in
/configs/connects/conf/. Why can't the Solr instance find it? Am I missing
some crucial configuration setting?

Thanks,
Andrej


Re: Ugh! My term is the entire record

2015-12-16 Thread Binoy Dalal
What is the type of the fields in question?
What you're seeing will happen if a field is of type string. If this is the
case then try changing your field type to text_en or text_general depending
on your requirements.

On Wed, 16 Dec 2015, 19:51 Mark Fenbers  wrote:

> Greetings,
>
> I had my Solr searching capabilities working for a while.  But today I
> inadvertently "unload"d my core from the Admin Interface. After adding
> it back in, it is not working right. Because Solr was down for a while
> in recent weeks, I have also done a full import with the clean option.
> So now, searching on words like Ohio or forecast (both very popular
> words in the documents) return 0 results.
>
> In Schema Browser, "Show Term Info" now reveals that my terms are the
> *entire* text string record instead of individual words.  I had come
> across this issue before, during initially setting up Solr, but now I
> can't remember what I had done to get it to index each *word* instead of
> the entire String stored in the DB record.
>
> Can someone please point me to the trick that does the proper parsing
> and indexing of *each word* in each record?
>
> thanks!
> Mark
>
-- 
Regards,
Binoy Dalal


Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Andrea Gazzarini
Hi Evert,
what is the configuration of the default request handler? Did you set the
hl.fl parameter?

Please check here [1] the parameters that the highlighting component
expects. Required parameters should be in the query string or declared
within the request handler which answers to your query.

Andrea

[1] https://wiki.apache.org/solr/HighlightingParameters




2015-12-16 12:51 GMT+01:00 Evert R. :

> Hi everyone!
>
> I think I should not have posted my server name... never had that many
> access attempts...
>
>
>
> 2015-12-16 9:03 GMT-02:00 Evert R. :
>
> > Hello Erick,
> >
> > Thanks again for your time.
> >
> > Here is as far as I have gone:
> >
> > 1. I started a fresh install and did the following:
> >
> > [evert@nix]$ bin/solr start -e techproducts
> > [evert@nix]$ curl '
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1=true
> '
> > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> >
> > 2. I am using only the Solr Admin UI to check the query respond, here is
> > an example:
> >
> > Query: http://
> > ​localhost
> >
> >
> :8983/solr/techproducts/select?q=nietava=id%2C+author%2C+content=json=true=true=%3Cem%3E=%3C%2Fem%3E
> >
> > Result: {
> >   "responseHeader": {
> > "status": 0,
> > "QTime": 14,
> > "params": {
> >   "q": "nietava",
> >   "hl": "true",
> >   "hl.simple.post": "",
> >   "indent": "true",
> >   "fl": "id, author, content",
> >   "wt": "json",
> >   "hl.simple.pre": "",
> >   "_": "1450262674102"
> > }
> >   },
> >   "response": {
> > "numFound": 1,
> > "start": 0,
> > "docs": [
> >   {
> > "id": "pdf1",
> > "author": "Wander",
> > "content": [
> >   "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n\n
> > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
> > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo Espiritual”
> \n
> > \n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n \n
> \n
> > \n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua Souza
> > Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
> > www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
> > Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n Coleção
> > \n“A Vida no Mundo Espiritual” \n"
> > ]
> >   }
> > ]
> >   },
> >   "highlighting": {
> > "pdf1": {}
> >   }
> > }
> >
> > **On the content it brings the whole pdf content (book), and notice that
> > in the highlight it shows empty.
> >
> > I tried creating a new core with bin/solr create -c test, using the
> > schema.xml and solrconfig.xml standard found in
> > /solr/server/solr/configsets/basic_configs/conf
> >
> > But even though... not working as expected (I think).
> >
> >
> > Would you know how to set this techproducts example to bring the snnipets
> > of text?
> >
> > The server only allows specific ip address for this port, if you would, I
> > could get it open for you to check.
> >
> >
> > Thanks again and best regards!
> >
> >
> >
> >
> > *Evert Ramos*
> > *evert.ra...@gmail.com *
> >
> >
> > 2015-12-15 18:14 GMT-02:00 Erick Erickson :
> >
> >> No, that's not what I meant. The highlight component adds a special
> >> section to the return packet that will contain "snippets" of text with
> >> highlights. You control how big those snippets are via various
> >> parameters in the highlight component and they'll have the tags you
> >> specify for highlighting.
> >>
> >> Your app needs to pull the information from the highlight portion of
> >> the response packet rather than the document list. Just execute your
> >> queries via cURL or a browser to see the structure of a response to
> >> see what I mean.
> >>
> >> And note that you do _not_ need to return the fields you're
> >> highlighting in the "fl" list so you do _not_ need to return the
> >> entire document contents.
> >>
> >> What are you using to display the results anyway?
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Dec 15, 2015 at 10:02 AM, Evert R. 
> wrote:
> >> > Hi Erick,
> >> >
> >> > Thank you very much for the reply!!
> >> >
> >> > I do get back the full text, autor, and a whole lots of stuff which
> >> doesn´t
> >> > really matter for my project.
> >> >
> >> > So, what you are saying is that the solr gets me back the full content
> >> and
> >> > my application will fix the rest? Which means for me that all my books
> >> (pdf
> >> > files) when searching for an specific word it will bring me the whole
> >> book
> >> > content that has the requested query. And my application (php) in this
> >> > case... will take care of show only part of the text (such as in
> >> highlight,
> >> > as I was understandind) and hightlight the key word I was looking for?
> >> >
> >> > If so, Erick, you gave me a big help clearing out... I 

Solr cloud instance does not read cores from Zookeeper whilst connected

2015-12-16 Thread Andrej van der Zee
Hi,

I have setup Zookeer and uploaded a collection config. But somehow it seems
that Solr keeps reading core definitions locally ("Looking for core
definitions underneath /opt/solr/server/solr") instead of getting it from
Zookeep. Below the logs.

Probably some kind of config thingy, unfortunately I cannot get from the
documentation what is missing.

In the Solr GUI I can see the Cloud tree on Zookeeper. But I can see this
WARN in the GUI:
12/16/2015, 1:22:45 PMWARNnullZookeeperInfoServletState for collection
connects not found in /clusterstate.json or
/collections/connects/state.json!

Is that related?

Thanks,
Andrej



Starting Solr in SolrCloud mode on port 8983 from /opt/solr/server

0INFO  (main) [   ] o.e.j.u.log Logging initialized @269ms
265  INFO  (main) [   ] o.e.j.s.Server jetty-9.2.11.v20150529
278  WARN  (main) [   ] o.e.j.s.h.RequestLogHandler !RequestLog
279  INFO  (main) [   ] o.e.j.d.p.ScanningAppProvider Deployment monitor
[file:/opt/solr/server/contexts/] at interval 0
586  INFO  (main) [   ] o.e.j.w.StandardDescriptorProcessor NO JSP Support
for /solr, did not find org.apache.jasper.servlet.JspServlet
595  WARN  (main) [   ] o.e.j.s.SecurityHandler
ServletContext@o.e.j.w.WebAppContext@57fffcd7{/solr,file:/opt/solr/server/solr-webapp/webapp/,STARTING}{/opt/solr/server/solr-webapp/webapp}
has uncovered http methods for path: /
599  INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
SolrDispatchFilter.init(): WebAppClassLoader=1926764753@72d818d1
609  INFO  (main) [   ] o.a.s.c.SolrResourceLoader JNDI not configured for
solr (NoInitialContextEx)
609  INFO  (main) [   ] o.a.s.c.SolrResourceLoader using system property
solr.solr.home: /opt/solr/server/solr
610  INFO  (main) [   ] o.a.s.c.SolrResourceLoader new SolrResourceLoader
for directory: '/opt/solr/server/solr/'
719  WARN  (main) [   ] o.a.s.s.SolrDispatchFilter Solr property
solr.solrxml.location is no longer supported. Will automatically load
solr.xml from ZooKeeper if it exists
725  INFO  (main) [   ] o.a.s.c.c.SolrZkClient Using default
ZkCredentialsProvider
744  INFO  (main) [   ] o.a.s.c.c.ConnectionManager Waiting for client to
connect to ZooKeeper
800  INFO  (zkCallback-1-thread-1) [   ] o.a.s.c.c.ConnectionManager
Watcher org.apache.solr.common.cloud.ConnectionManager@e250cde
name:ZooKeeperConnection Watcher:172.31.11.65:2181 got event WatchedEvent
state:SyncConnected type:None path:null path:null type:None
800  INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client is connected to
ZooKeeper
800  INFO  (main) [   ] o.a.s.c.c.SolrZkClient Using default ZkACLProvider
805  INFO  (main) [   ] o.a.s.s.SolrDispatchFilter Loading solr.xml from
SolrHome (not found in ZooKeeper)
807  INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading container
configuration from /opt/solr/server/solr/solr.xml
852  INFO  (main) [   ] o.a.s.c.CoresLocator Config-defined core root
directory: /opt/solr/server/solr
867  INFO  (main) [   ] o.a.s.c.CoreContainer New CoreContainer 510063093
867  INFO  (main) [   ] o.a.s.c.CoreContainer Loading cores into
CoreContainer [instanceDir=/opt/solr/server/solr/]
867  INFO  (main) [   ] o.a.s.c.CoreContainer loading shared library:
/opt/solr/server/solr/lib
868  WARN  (main) [   ] o.a.s.c.SolrResourceLoader No files added to
classloader from lib: lib (resolved as: /opt/solr/server/solr/lib).
879  INFO  (main) [   ] o.a.s.h.c.HttpShardHandlerFactory created with
socketTimeout : 60,connTimeout : 6,maxConnectionsPerHost :
20,maxConnections : 1,corePoolSize : 0,maximumPoolSize :
2147483647,maxThreadIdleTime : 5,sizeOfQueue : -1,fairnessPolicy :
false,useRetries : false,
1009 INFO  (main) [   ] o.a.s.u.UpdateShardHandler Creating
UpdateShardHandler HTTP client with params:
socketTimeout=60=6=true
1010 INFO  (main) [   ] o.a.s.l.LogWatcher SLF4J impl is
org.slf4j.impl.Log4jLoggerFactory
1011 INFO  (main) [   ] o.a.s.l.LogWatcher Registering Log Listener [Log4j
(org.slf4j.impl.Log4jLoggerFactory)]
1012 INFO  (main) [   ] o.a.s.c.ZkContainer Zookeeper client=
172.31.11.65:2181
1027 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Waiting for client to
connect to ZooKeeper
1031 INFO  (zkCallback-3-thread-1-processing-n:172.31.11.63:8983_solr) [
] o.a.s.c.c.ConnectionManager Watcher
org.apache.solr.common.cloud.ConnectionManager@1de3d8b2
name:ZooKeeperConnection Watcher:172.31.11.65:2181 got event WatchedEvent
state:SyncConnected type:None path:null path:null type:None
1031 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client is connected to
ZooKeeper
1058 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updating cluster state from
ZooKeeper...
2085 INFO  (main) [   ] o.a.s.c.ZkController Register node as live in
ZooKeeper:/live_nodes/172.31.11.63:8983_solr
2089 INFO  (main) [   ] o.a.s.c.ZkController Found a previous node that
still exists while trying to register a new live node
/live_nodes/172.31.11.63:8983_solr - removing existing node to create
another.
2089 INFO  (main) [   ] o.a.s.c.c.SolrZkClient makePath:

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hello Erick,

Thanks again for your time.

Here is as far as I have gone:

1. I started a fresh install and did the following:

[evert@nix]$ bin/solr start -e techproducts
[evert@nix]$ curl '
http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1=true'
-F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"

2. I am using only the Solr Admin UI to check the query respond, here is an
example:

Query:
http://nix.budhi.com.br:8983/solr/techproducts/select?q=nietava=id%2C+author%2C+content=json=true=true=%3Cem%3E=%3C%2Fem%3E

Result: {
  "responseHeader": {
"status": 0,
"QTime": 14,
"params": {
  "q": "nietava",
  "hl": "true",
  "hl.simple.post": "",
  "indent": "true",
  "fl": "id, author, content",
  "wt": "json",
  "hl.simple.pre": "",
  "_": "1450262674102"
}
  },
  "response": {
"numFound": 1,
"start": 0,
"docs": [
  {
"id": "pdf1",
"author": "Wander",
"content": [
  "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n\n
Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n Sexo e
Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo Espiritual” \n
\n  \n \n \n \n Ditado pelo Espírito \nAndré Luiz \n \n  \n \n \n \n \n \n
\n FEDERAÇÃO ESPÍRITA BRASILEIRA \nDEPARTAMENTO EDITORIAL \n \n Rua Souza
Valente, 17 \n20941-040 - Rio - RJ - Brasil \n \n  \nhttp://
www.febnet.org.br/  \n  \n \n   \n Francisco Cândido Xavier - Sexo e
Destino - pelo Espírito André Luiz \n \n  \n2 \n \n  \n \n \n \n Coleção
\n“A Vida no Mundo Espiritual” \n"
]
  }
]
  },
  "highlighting": {
"pdf1": {}
  }
}

**On the content it brings the whole pdf content (book), and notice that in
the highlight it shows empty.

I tried creating a new core with bin/solr create -c test, using the
schema.xml and solrconfig.xml standard found in
/solr/server/solr/configsets/basic_configs/conf

But even though... not working as expected (I think).


Would you know how to set this techproducts example to bring the snnipets
of text?

The server only allows specific ip address for this port, if you would, I
could get it open for you to check.


Thanks again and best regards!




*Evert Ramos*
*evert.ra...@gmail.com *


2015-12-15 18:14 GMT-02:00 Erick Erickson :

> No, that's not what I meant. The highlight component adds a special
> section to the return packet that will contain "snippets" of text with
> highlights. You control how big those snippets are via various
> parameters in the highlight component and they'll have the tags you
> specify for highlighting.
>
> Your app needs to pull the information from the highlight portion of
> the response packet rather than the document list. Just execute your
> queries via cURL or a browser to see the structure of a response to
> see what I mean.
>
> And note that you do _not_ need to return the fields you're
> highlighting in the "fl" list so you do _not_ need to return the
> entire document contents.
>
> What are you using to display the results anyway?
>
> Best,
> Erick
>
> On Tue, Dec 15, 2015 at 10:02 AM, Evert R.  wrote:
> > Hi Erick,
> >
> > Thank you very much for the reply!!
> >
> > I do get back the full text, autor, and a whole lots of stuff which
> doesn´t
> > really matter for my project.
> >
> > So, what you are saying is that the solr gets me back the full content
> and
> > my application will fix the rest? Which means for me that all my books
> (pdf
> > files) when searching for an specific word it will bring me the whole
> book
> > content that has the requested query. And my application (php) in this
> > case... will take care of show only part of the text (such as in
> highlight,
> > as I was understandind) and hightlight the key word I was looking for?
> >
> > If so, Erick, you gave me a big help clearing out... I thought I would do
> > that with Solr in an easy way. =)
> >
> > Thanks for the attachements tip!
> >
> > Best regards,
> >
> > Evert
> >
> > 2015-12-15 14:56 GMT-02:00 Erick Erickson :
> >
> >> How are you trying to display the results? Highlighting is a bit of an
> >> odd beast. Assuming it's correctly configured, the response packet
> >> will have a separate highlight section, it's the application's
> >> responsibility to present that pleasingly.
> >>
> >> What _do_ you get bak in the response?
> >>
> >> BTW, the mail sever pretty aggressively strips attachments, your's
> >> didn't come through.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Dec 15, 2015 at 3:25 AM, Evert R. 
> wrote:
> >> > Hi there!
> >> >
> >> > It´s my first installation, not sure if here is the right channel...
> >> >
> >> > Here is my steps:
> >> >
> >> > 1. Set up a basic install of solr 5.4.0
> >> >
> >> > 2. Create a new core through command line (bin/solr create -c test)
> >> >
> >> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test 

minimum should match, cant explain the amount of hits

2015-12-16 Thread Ron van der Vegt

Hi,

I'm currently searching with the following query: q="sony+led+tv".
The minimum should match setting is set on: mm=2<65%.
So when there are more then two terms, at least 65% of the terms should 
match.

I'm not using the StopFilterFactory.

When turning on debug, this is the parsedquery_toString:

+(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0 
| breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 | 
salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15 
(categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 | 
breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led | 
brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 | 
name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 | 
text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 | 
salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15


While I except that at least two terms should match, because of the 65%, 
i'm also getting hits of documents which seems to match on only one of 
the terms. Below the explain of the hit, which shouldn't be there:


2.6449876 = sum of:
  2.6449876 = sum of:
2.6449876 = max plus 0.15 times others of:
  2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
), product of:
  2.6449876 = idf(docFreq=3254, maxDocs=45833)
  1.0 = tfNorm, computed from:
1.0 = termFreq=1.0
1.0 = parameter k1
0.0 = parameter b (norms omitted for field)

When I change the mm to 2<67% then I get the amount of results what I 
expect with 65%, but If I understand correctly then all the terms should 
match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss 
something, or is there something else what could effect the minimum 
should match setting?


Thanks in advice!

Ron


Re: minimum should match, cant explain the amount of hits

2015-12-16 Thread Binoy Dalal
My guess is that solr is rounding down while calculating number of
mandatory terms.
In your case, there are 3 terms, 65% of which is 1.95 which rounded down is
1, but 67% is 2.01 which rounded down is 2 which conforms with the results
you're seeing.

Maybe someone else can confirm this.

On Wed, 16 Dec 2015, 16:56 Ron van der Vegt 
wrote:

> Hi,
>
> I'm currently searching with the following query: q="sony+led+tv".
> The minimum should match setting is set on: mm=2<65%.
> So when there are more then two terms, at least 65% of the terms should
> match.
> I'm not using the StopFilterFactory.
>
> When turning on debug, this is the parsedquery_toString:
>
> +(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0
> | breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 |
> salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15
> (categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 |
> breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led |
> brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 |
> name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 |
> text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 |
> salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15
>
> While I except that at least two terms should match, because of the 65%,
> i'm also getting hits of documents which seems to match on only one of
> the terms. Below the explain of the hit, which shouldn't be there:
>
> 2.6449876 = sum of:
>2.6449876 = sum of:
>  2.6449876 = max plus 0.15 times others of:
>2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
>  2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
> ), product of:
>2.6449876 = idf(docFreq=3254, maxDocs=45833)
>1.0 = tfNorm, computed from:
>  1.0 = termFreq=1.0
>  1.0 = parameter k1
>  0.0 = parameter b (norms omitted for field)
>
> When I change the mm to 2<67% then I get the amount of results what I
> expect with 65%, but If I understand correctly then all the terms should
> match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss
> something, or is there something else what could effect the minimum
> should match setting?
>
> Thanks in advice!
>
> Ron
>
-- 
Regards,
Binoy Dalal


Re: minimum should match, cant explain the amount of hits

2015-12-16 Thread Ron van der Vegt

Thanks! This makes sense, I will change my configuration to 2<-35%

On 16-12-15 13:11, Binoy Dalal wrote:

The edismax documentation confirms that when a positive % value is
provided, solr will round down. If you want solr to round up set your
parameter value as '-35%'

On Wed, 16 Dec 2015, 17:28 Binoy Dalal  wrote:


My guess is that solr is rounding down while calculating number of
mandatory terms.
In your case, there are 3 terms, 65% of which is 1.95 which rounded down
is 1, but 67% is 2.01 which rounded down is 2 which conforms with the
results you're seeing.

Maybe someone else can confirm this.

On Wed, 16 Dec 2015, 16:56 Ron van der Vegt 
wrote:


Hi,

I'm currently searching with the following query: q="sony+led+tv".
The minimum should match setting is set on: mm=2<65%.
So when there are more then two terms, at least 65% of the terms should
match.
I'm not using the StopFilterFactory.

When turning on debug, this is the parsedquery_toString:

+(((categoryName_snf:sony^5.0 | name:sony^6.0 | productTypeName:sony^8.0
| breadcrumbpath_snf:sony^2.0 | text:sony | title:sony^14.0 |
salestext:sony | brand:sony^8.0 | salesText_snf:sony)~0.15
(categoryName_snf:led^5.0 | name:led^6.0 | productTypeName:led^8.0 |
breadcrumbpath_snf:led^2.0 | text:led | title:led^14.0 | salestext:led |
brand:led^8.0 | salesText_snf:led)~0.15 (categoryName_snf:tv^5.0 |
name:tv^6.0 | productTypeName:tv^8.0 | breadcrumbpath_snf:tv^2.0 |
text:tv | title:tv^14.0 | salestext:tv | brand:tv^8.0 |
salesText_snf:tv)~0.15)~1) (title:"sony led tv"~10)~0.15

While I except that at least two terms should match, because of the 65%,
i'm also getting hits of documents which seems to match on only one of
the terms. Below the explain of the hit, which shouldn't be there:

2.6449876 = sum of:
2.6449876 = sum of:
  2.6449876 = max plus 0.15 times others of:
2.6449876 = weight(text:led in 10143) [BM25Similarity], result of:
  2.6449876 = score(doc=10143,freq=1.0 = termFreq=1.0
), product of:
2.6449876 = idf(docFreq=3254, maxDocs=45833)
1.0 = tfNorm, computed from:
  1.0 = termFreq=1.0
  1.0 = parameter k1
  0.0 = parameter b (norms omitted for field)

When I change the mm to 2<67% then I get the amount of results what I
expect with 65%, but If I understand correctly then all the terms should
match. (33,33% + 33,33% = 66,66% is always less then 67%). Did I miss
something, or is there something else what could effect the minimum
should match setting?

Thanks in advice!

Ron


--
Regards,
Binoy Dalal





Re: similarity as a parameter

2015-12-16 Thread Ahmet Arslan
Hi Markus,
I confirm (if that counts) that all current built-in similarities (expect Sweet 
spot) save same stuff into the norms.

They can be switched/changed at search time. Actually, I am doing this today 
with Lucene, experimenting different term-weighting models using a single 
index. It would be impractical to create a dedicated/separate index for each 
term weighting model (similarity).


Only assumption here is, their discountOverlap should remain same with the one 
used at index time.

But of course we cannot make any assumptions regarding custom similarity 
implementations.

As Hoss pointed out, discountOverlap affects the way document length calculated.
It cannot be changed on the search time. 

Thanks,
Ahmet






On Tuesday, December 15, 2015 9:43 PM, Chris Hostetter 
 wrote:


: Sweetspot does require reindexing but is that the only one? I have not 
: investigated some exotic implementations, anyone to confirm sweetspot is 
: the only one? In that case you could patch QueryComponent right, instead 
: of having a custom component?

I'm not sure how where this thread developed this weird assumption that 
switching from/to SweetSpotSimilarity in particular requires reindexing 
but that many/most other Similarities wouldn't require this ...  
SweetSpotSimilarity certainly has explicit config options for tuning the 
index time field norm, but it's not a special case...

1) Solr shouldn't make any naive assumptions about whatever 
arbitrary (custom) Similarity class a user might provide -- particularly 
when it comes to field norms, since the all of the Similarity base classes 
/ callers have been setup to make it trivial for people to write custom 
ismilarities for the express purpose of adjusting how many bits are used 
by field norms.

2) In both ClassicSimilarity and BM25Similarity (the new default in Solr6) 
the config option "discountOverlaps" impacts what norm values get encoded 
at index time for a given field length -- so it's possible to break things 
w/o even switching what class you use, w/o even consider custom Similarity 
impls (or new out of the box similarity classes that might be added to 
Solr tomorow)


-Hoss
http://www.lucidworks.com/


Issues when indexing PDF files

2015-12-16 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Solr 5.3.0

I'm indexing some PDF documents. However, for certain PDF files, there are
chinese text in the documents, but after indexing, what is indexed in the
content is either a series of "??" or an empty content.

I'm using the post.jar that comes together with Solr.

What could be the reason that causes this?

Regards,
Edwin


Re: Issues when indexing PDF files

2015-12-16 Thread Erik Hatcher
Edwin - Can you share one of those PDF files?

Also, drop the file into the Tika app and see what it sees directly - get the 
tika-app JAR and run that desktop application.

Could be an encoding issue?  

Erik

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 



> On Dec 16, 2015, at 10:51 AM, Zheng Lin Edwin Yeo  
> wrote:
> 
> Hi,
> 
> I'm using Solr 5.3.0
> 
> I'm indexing some PDF documents. However, for certain PDF files, there are
> chinese text in the documents, but after indexing, what is indexed in the
> content is either a series of "??" or an empty content.
> 
> I'm using the post.jar that comes together with Solr.
> 
> What could be the reason that causes this?
> 
> Regards,
> Edwin



Re: Solr Basic Configuration - Highlight - Begginer

2015-12-16 Thread Evert R.
Hi Andrea,

ok, let´s do it:

1. it does has the 'nietava' term, so it brings the only book (pdf file)
has this word, and all its content as my previous message to Erick, so the
content field is there.

2. using content:nietava it does not show any result as below:

{ "responseHeader": { "status": 400, "QTime": 12, "params": { "q":
"contents:nietava", "indent": "true", "fl": "id", "wt": "json", "_":
"1450282631352" } }, "error": { "msg": "undefined field contents", "code":
400 } }

3. Here is what I found when grepping 'content' from the techproducts conf
folder:

schema.xml:  schema.xml: 
schema.xml:  schema.xml:
 solrconfig.xml: content_type solrconfig.xml: content features title name solrconfig.xml: 3 solrconfig.xml: 200 solrconfig.xml: content solrconfig.xml: 750 solrconfig.xml: application/json solrconfig.xml: application/csv solrconfig.xml: text/plain; charset=UTF-8

and the grep on 'content_type':

schema.xml:   
schema.xml:   
solrconfig.xml:   content_type

=)

Thanks for checking out.



*Evert ​​*

2015-12-16 12:59 GMT-02:00 Andrea Gazzarini :

> hl=f.content.hl.content (I guess) is definitely wrong. Some questions:
>
>- First, sorry, the obvious question: are you sure the documents contain
>the "nietava" term?
>- Could you try to use q=content:nietaval?
>- Could you paste the definition (field & fieldtype) of the content
>field?
>
> > Should I have this configuration in the XML file?
>
> You could, but it's up to you and it strongly depends on your context. The
> simple thing is that if you have those parameters within the configuration
> you can avoid to pass them (as part of the requests), but probably in this
> phase, where you are testing, it's better to have them there (in the
> request).
>
> Andrea
>
> 2015-12-16 15:28 GMT+01:00 Evert R. :
>
> > Hi Andrea,
> >
> > Thanks for the reply!
> >
> > I tried with the hl.fl parameter as well, using as below:
> >
> >
> >
> http://localhost:8983/solr/techproducts/select?q=nietava=id%2C+content=json=true=true;
> >
> >
> hl.fl=f.content.hl.content%3D4=%3Cem%3E=%3C%2Fem%3E
> >
> > with the parameter under the hl field in the solr ui:
> >
> > 1. f.content.hl.snnipets=2
> > 2. f.content.hl.content=4
> > 3. content
> >
> > with no success...
> >
> > Should I have this configuration in the XML file?
> >
> > Regards,
> >
> > *Evert *
> >
> > 2015-12-16 11:23 GMT-02:00 Andrea Gazzarini :
> >
> > > Hi Evert,
> > > what is the configuration of the default request handler? Did you set
> the
> > > hl.fl parameter?
> > >
> > > Please check here [1] the parameters that the highlighting component
> > > expects. Required parameters should be in the query string or declared
> > > within the request handler which answers to your query.
> > >
> > > Andrea
> > >
> > > [1] https://wiki.apache.org/solr/HighlightingParameters
> > >
> > >
> > >
> > >
> > > 2015-12-16 12:51 GMT+01:00 Evert R. :
> > >
> > > > Hi everyone!
> > > >
> > > > I think I should not have posted my server name... never had that
> many
> > > > access attempts...
> > > >
> > > >
> > > >
> > > > 2015-12-16 9:03 GMT-02:00 Evert R. :
> > > >
> > > > > Hello Erick,
> > > > >
> > > > > Thanks again for your time.
> > > > >
> > > > > Here is as far as I have gone:
> > > > >
> > > > > 1. I started a fresh install and did the following:
> > > > >
> > > > > [evert@nix]$ bin/solr start -e techproducts
> > > > > [evert@nix]$ curl '
> > > > >
> > > >
> > >
> >
> http://localhost:8983/solr/techproducts/update/extract?literal.id=pdf1=true
> > > > '
> > > > > -F "Emmanuel=@/home/solr/dados/teste/Emmanuel.pdf"
> > > > >
> > > > > 2. I am using only the Solr Admin UI to check the query respond,
> here
> > > is
> > > > > an example:
> > > > >
> > > > > Query: http://
> > > > > ​localhost
> > > > >
> > > > >
> > > >
> > >
> >
> :8983/solr/techproducts/select?q=nietava=id%2C+author%2C+content=json=true=true=%3Cem%3E=%3C%2Fem%3E
> > > > >
> > > > > Result: {
> > > > >   "responseHeader": {
> > > > > "status": 0,
> > > > > "QTime": 14,
> > > > > "params": {
> > > > >   "q": "nietava",
> > > > >   "hl": "true",
> > > > >   "hl.simple.post": "",
> > > > >   "indent": "true",
> > > > >   "fl": "id, author, content",
> > > > >   "wt": "json",
> > > > >   "hl.simple.pre": "",
> > > > >   "_": "1450262674102"
> > > > > }
> > > > >   },
> > > > >   "response": {
> > > > > "numFound": 1,
> > > > > "start": 0,
> > > > > "docs": [
> > > > >   {
> > > > > "id": "pdf1",
> > > > > "author": "Wander",
> > > > > "content": [
> > > > >   "André Luiz - Sexo e Destino _Chico e Waldo_.doc \n \n
> > \n
> > > > > Francisco Cândido Xavier \ne \n \n Waldo Vieira \n \n \n \n \n
> Sexo e
> > > > > Destino \n \n \n \n 12o livro da Coleção \n“A Vida no Mundo
> > Espiritual”
> > > > \n
> > > > > \n  \n \n \n \n 

Re: Security Problems

2015-12-16 Thread Noble Paul
I don't this behavior is intuitive. It is very easy to misunderstand

I would rather just add a flag to "authentication" plugin section
which says "blockUnauthenticated" : true

which means all unauthenticated requests must be blocked.




On Tue, Dec 15, 2015 at 7:09 PM, Jan Høydahl  wrote:
> Yes, that’s why I believe it should be:
> 1) if only authentication is enabled, all users must authenticate and all 
> authenticated users can do anything.
> 2) if authz is enabled, then all users must still authenticate, and can by 
> default do nothing at all, unless assigned proper roles
> 3) if a user is assigned the default “read” rule, and a collection adds a 
> custom “/myselect” handler, that one is unavailable until the user gets it 
> assigned
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>> 14. des. 2015 kl. 14.15 skrev Noble Paul :
>>
>> ". If all paths were closed by default, forgetting to configure a path
>> would not result in a security breach like today."
>>
>> But it will still mean that unauthorized users are able to access,
>> like guest being able to post to "/update". Just authenticating is not
>> enough without proper authorization
>>
>> On Mon, Dec 14, 2015 at 3:59 PM, Jan Høydahl  wrote:
 1) "read" should cover all the paths
>>>
>>> This is very fragile. If all paths were closed by default, forgetting to 
>>> configure a path would not result in a security breach like today.
>>>
>>> /Jan
>>
>>
>>
>> --
>> -
>> Noble Paul
>



-- 
-
Noble Paul


RE: DIH Caching w/ BerkleyBackedCache

2015-12-16 Thread Todd Long
James,

I apologize for the late response.


Dyer, James-2 wrote
> With the DIH request, are you specifying "cacheDeletePriorData=false"

We are not specifying that property (it looks like it defaults to "false").
I'm actually seeing this issue when running a full clean/import.

It appears that the Berkeley DB "cleaner" is always removing the oldest file
once there are three. In this case, I'll see two 1GB files and then as the
third file is being written (after ~200MB) the oldest 1GB file will fall off
(i.e. get deleted). I'm only utilizing ~13% disk space at the time. I'm
using Berkeley DB version 4.1.6 with Solr 4.8.1. I'm not specifying any
other configuration properties other than what I mentioned before. I simply
cannot figure out what is going on with the "cleaner" logic that would deem
that file "lowest utilized". Any other Berkeley DB/system configuration I
could consider that would affect this?

It's possible that this caching simply might not be suitable for our data
set where one document might contain a field with tens of thousands of
values... maybe this is the bottleneck with using this database as every add
copies in the prior data and then the "cleaner" removes the old stuff. Maybe
it's working like it should but just incredibly slow... I can get a full
index without caching in about two hours, however, when using this caching
it was still running after 24 hours (still caching the sub-entity).

Thanks again for the reply.

Respectfully,
Todd



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4245777.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issues when indexing PDF files

2015-12-16 Thread Zheng Lin Edwin Yeo
Hi Erik,

I've shared the file on dropbox, which you can access via the link here:
https://www.dropbox.com/s/rufi9esmnsmzhmw/Desmophen%2B670%2BBAe.pdf?dl=0

This is what I get from the Tika app after dropping the file in.

Content-Length: 75092
Content-Type: application/pdf
Type: COSName{Info}
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-TIKA:digest:MD5: de67120e29ec7ffa24aec7e17104b6bf
X-TIKA:digest:SHA256:
d0f04580d87290c1bc8068f3d5b34d797a0d8ccce2b18f626a37958c439733e7
access_permission:assemble_document: true
access_permission:can_modify: true
access_permission:can_print: true
access_permission:can_print_degraded: true
access_permission:extract_content: true
access_permission:extract_for_accessibility: true
access_permission:fill_in_form: true
access_permission:modify_annotations: true
dc:format: application/pdf; version=1.3
pdf:PDFVersion: 1.3
pdf:encrypted: false
producer: null
resourceName: Desmophen+670+BAe.pdf
xmpTPg:NPages: 3


Regards,
Edwin


On 17 December 2015 at 00:15, Erik Hatcher  wrote:

> Edwin - Can you share one of those PDF files?
>
> Also, drop the file into the Tika app and see what it sees directly - get
> the tika-app JAR and run that desktop application.
>
> Could be an encoding issue?
>
> Erik
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com 
>
>
>
> > On Dec 16, 2015, at 10:51 AM, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi,
> >
> > I'm using Solr 5.3.0
> >
> > I'm indexing some PDF documents. However, for certain PDF files, there
> are
> > chinese text in the documents, but after indexing, what is indexed in the
> > content is either a series of "??" or an empty content.
> >
> > I'm using the post.jar that comes together with Solr.
> >
> > What could be the reason that causes this?
> >
> > Regards,
> > Edwin
>
>


Re: Ugh! My term is the entire record

2015-12-16 Thread Mark Fenbers

Yup! That was it!  Thanks!
(I changed "string" to "text_en" in my backup copy, too, so this doesn't 
happen again.)

Mark

On 12/16/2015 10:44 AM, Binoy Dalal wrote:

What is the type of the fields in question?
What you're seeing will happen if a field is of type string. If this is the
case then try changing your field type to text_en or text_general depending
on your requirements.

On Wed, 16 Dec 2015, 19:51 Mark Fenbers  wrote:





warning while indexing

2015-12-16 Thread Midas A
Getting following warning while indexing ..Anybody please tell me the reason .


java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@9916a67
rejected from 
java.util.concurrent.ScheduledThreadPoolExecutor@79f8b5f[Terminated,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
2046]
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at 
java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
at 
java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
at 
org.apache.solr.update.CommitTracker._scheduleCommitWithin(CommitTracker.java:150)
at 
org.apache.solr.update.CommitTracker._scheduleCommitWithinIfNeeded(CommitTracker.java:118)
at 
org.apache.solr.update.CommitTracker.addedDocument(CommitTracker.java:169)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:231)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:451)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:587)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
at 
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:500)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
at 
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:353)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:219)
at 
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:451)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:489)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)


Custom auth plugin not loaded in SolrCloud

2015-12-16 Thread Kristine Jetzke
Hi,
 
I'm trying to include a custom authentication plugin in my SolrCloud 
installation. It only works when I add it to 
server\solr-webapp\webapp\WEB-INF\lib or to the solr home directory of each 
node. 
 
If I add it as described here 
https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode[https://3c.gmx.net/mail/client/dereferrer?redirectUrl=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FAdding%2BCustom%2BPlugins%2Bin%2BSolrCloud%2BMode]
 it cannot find my class at all. What I did:
- Started node with enable runtime libe flag: solr start -cloud -s "my-dir" -z 
localhost:2181 -Denable.runtime.lib=true
- Created .system collection as described here: 
https://cwiki.apache.org/confluence/display/solr/Blob+Store+API[https://3c.gmx.net/mail/client/dereferrer?redirectUrl=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FBlob%2BStore%2BAPI]
- Uploaded jar as described here: 
https://cwiki.apache.org/confluence/display/solr/Blob+Store+API[https://3c.gmx.net/mail/client/dereferrer?redirectUrl=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FBlob%2BStore%2BAPI]
- Added jar as runtime lib to my collection as described here: 
https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode[https://3c.gmx.net/mail/client/dereferrer?redirectUrl=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FAdding%2BCustom%2BPlugins%2Bin%2BSolrCloud%2BMode]
- Checked in configoverlay.json that jar was added
 
I had a brief look at the code and noticed that 
org.apache.solr.core.CoreContainer.initializeAuthorizationPlugin(Map) uses the SolrResourceLoader to load the plugin. However, in the wiki 
it says that "The default SolrResourceLoader does not have visibility to the 
jars that have been defined as runtime libraries."
 
Is there anything I can do to make it work?
 
Thanks,
 
tine


Von meinem iPhone gesendet

Re: Is DIH going to be removed from Solr future versions?

2015-12-16 Thread Alexandre Rafalovitch
Are you saying to do a local mini-collection and then mirror final result
to the real one?

What about deletions? Per-entry cleanup statements and so on? DIH does full
updates, not just additions.

Or did I miss the focus?

Regards,
Alex
On 15 Dec 2015 11:46 pm, "Erik Hatcher"  wrote:

> With time shaken loose, IMO ideally what we do (under
> https://issues.apache.org/jira/browse/SOLR-7188 <
> https://issues.apache.org/jira/browse/SOLR-7188> probably) is create an
> update processor that *forwards* to a _real_ Solr collection update
> handler, and fire up EmbeddedSolrServer in a client-side command-line tool
> that can run /update/extract, DIH stuff, etc - does what it does now to
> extract, parse, and build documents and then forwards them via javabin to a
> live Solr collection.   I’m not sure that SOLR-7188 currently spells it out
> like that, but it is a nice, clean, straightforward path from DIH and Tika
> embedded inside a real Solr cluster to leveraging and scaling it on its
> own.   We’d lose the DIH admin UI, but that’s ok by me.
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com 
>
>
>
> > On Dec 15, 2015, at 9:23 AM, Davis, Daniel (NIH/NLM) [C] <
> daniel.da...@nih.gov> wrote:
> >
> > I am aware of the problems with the implementation of DIH, but is there
> any problem with the XML driven data import capability?
> > Could it be rewritten (using modern XPath) to run as a part of SolrJ?
> >
> > I've been interested in that, but I just haven't been able to shake
> loose the time.
> >
> > -Original Message-
> > From: Upayavira [mailto:u...@odoko.co.uk]
> > Sent: Tuesday, December 15, 2015 5:04 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Is DIH going to be removed from Solr future versions?
> >
> > I doubt DIH will be "removed". It more likely will be relegated - still
> there, but emphasised less.
> >
> > Another possibility that has been mooted is to extract it, so that it
> can run outside of Solr. This strikes me as the best option. Having it run
> inside Solr strikes me as architecturally wrong, and also problematic in a
> SolrCloud world. Taking the DIH codebase and running it
> > *outside* Solr you get the best of DIH without the same set of issues.
> >
> > Upayavira
> >
> > On Tue, Dec 15, 2015, at 05:47 AM, Anil Cherian wrote:
> >> Dear Team,
> >>
> >> I use DIH extensively and even wrote my own custom transformers in
> >> some situations.
> >> Recently during an architecture discussion one of my team members told
> >> that Solr is going to take away DIH from its future versions.
> >>
> >> Is that true?
> >>
> >> Also is using DIH for say 2 or 3 million docs a good option for
> >> indexing an XML content data set. I am planning to use it either by
> >> calling separate entities parallely or multiple /dataimport in
> >> solrconfig.xml.
> >>
> >> Cld you please reply at your earliest convenience as it is an
> >> important decision for us to continue on DIH or not!
> >>
> >> Thanks and Rgds,
> >> Anil.
>
>


Re: integrate solr with preprocessor tools

2015-12-16 Thread sara hajili
hi Emir,tnx for answering
now my question is how i write this class?
i must use solr interfaces?
i see in above link that i can use solr analyzer.but how i use that?
plz say me how i start to write my own analyzer step by step...
which interface i can use and change to achieve my goal?
tnx

On Wed, Dec 9, 2015 at 1:50 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Sara,
> You need to wrap your code in tokenizer or token filter
> https://wiki.apache.org/solr/SolrPlugins
>
> If you want to improve existing and believe others can benefit from
> improvement, you can open ticket and submit patch.
>
> Thanks,
> Emir
>
>
> On 09.12.2015 10:41, sara hajili wrote:
>
>> hi i wanna to use solr , and language of my documents that i stored in
>> solr
>> is persian.
>> solr doesn't support persian as well as i want.so i find preprocessor
>> tools
>> like a normalization,tockenizer and etc ...
>> i don't want to use solr persian filter like persian tockenizer,i mean i
>> wanna to improve it.
>>
>> now my question is how i can integrate solr with this external
>> preprocessor
>> tools??
>>
>> tnx
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: pf2 pf3 and stopwords

2015-12-16 Thread elisabeth benoit
Thanks for your answer.

Actually, using a slop of 1 is something I can't do (because of other
specifications)

I guess I'll index differently.

Best regards,
Elisabeth

2015-12-14 16:24 GMT+01:00 Binoy Dalal :

> Moreover, the stopword de will work on your queries and not on your
> documents, meaning if you query 'Gare de Saint Lazare', the terms actually
> searched for will be Gare Saint and Lazare, 'de' will be filtered out.
>
> On Mon, Dec 14, 2015 at 8:49 PM Binoy Dalal 
> wrote:
>
> > This isn't a bug. During pf3 matching, since your query has only three
> > tokens, the entire query will be treated as a single phrase, and with
> slop
> > = 0, any word that comes in the middle of your query  - 'de' in this case
> > will cause the phrase to not be matched. If you want to get around this,
> > try setting your slop = 1 in which case it should match Gare Saint Lazare
> > even with the de in it.
> >
> > On Mon, Dec 14, 2015 at 7:22 PM elisabeth benoit <
> > elisaelisael...@gmail.com> wrote:
> >
> >> Hello,
> >>
> >> I am using solr 4.10.1. I have a field with stopwords
> >>
> >>
> >>  >> words="stopwords.txt"
> >> enablePositionIncrements="true"/>
> >>
> >> And I use pf2 pf3 on that field with a slop of 0.
> >>
> >> If the request is "Gare Saint Lazare", and I have a document "Gare de
> >> Saint
> >> Lazare", "de" being a stopword, this document doesn't get the pf3 boost,
> >> because of "de".
> >>
> >> I was wondering, is this normal? is this a bug? is something wrong with
> my
> >> configuration?
> >>
> >> Best regards,
> >> Elisabeth
> >>
> > --
> > Regards,
> > Binoy Dalal
> >
> --
> Regards,
> Binoy Dalal
>


Re: Partial sentence match with block join

2015-12-16 Thread Yangrui Guo
For example:

If company A is { name:"Apple Inc", location:"Los Alamos"} and company B is
{ name:"Banana Inc", location:"Los Angeles"} then if you only want to
retrieve company A you must use "Apple AND Inc AND Los AND Alamos"},
otherwise it will also retrieve company B. However if you use AND for all
terms then partial match wouldn't be possible. This seems to be
contradictory.

On Tuesday, December 15, 2015, Upayavira  wrote:

>
> Cab you give an example? I cannot understand what you mean from your
> description below.
>
> Thx!
>
> On Wed, Dec 16, 2015, at 12:42 AM, Yangrui Guo wrote:
> > This will be a very common situation. Amazon and Google now display
> > keywords missing in the document. However it seems that Solr parent-child
> > structure requires to use "AND" to confine all terms appear inside a
> > single
> > child document, otherwise it will totally disregard the parent-child
> > structure. Is there a way to achieve this?
> >
> > On Tuesday, December 15, 2015, Jack Krupansky  >
> > wrote:
> >
> > > Set the default operator to OR and optionally set the mm parameter to
> 2 to
> > > require at least two of the query terms to match, and don't quote the
> terms
> > > as a phrase unless you want an exact (optionally sloppy) match.
> > >
> > > Interesting example since I'll bet there are a lot of us who still
> think of
> > > the company as being named "Apple Computer" even though they dropped
> > > "Computer" from the name back in 2007. Also, it is "Inc.", not
> "Company",
> > > so a proper search would be for "Apple Inc." or the old "Apple
> Computer,
> > > Inc."
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > On Tue, Dec 15, 2015 at 2:35 AM, Yangrui Guo  
> > > > wrote:
> > >
> > > > Hello
> > > >
> > > > I've been using 5.3.1. I would like to enable this feature: when user
> > > > enters a query, the results should include documents that also
> partially
> > > > match the query. For example, the document is Apple
> Company
> > > > and user query is "apple computer company". Though the document is
> > > missing
> > > > the term "computer". I've tried phrase slop but it doesn't seem to be
> > > > working with block join. How can I do this in solr?
> > > >
> > > > Thanks
> > > >
> > > > Yangrui
> > > >
> > >
>


Re: faceting is unusable slow since upgrade to 5.3.0

2015-12-16 Thread Vincenzo D'Amore
Hi all,

given that solr 5.4 is finally released, is this what's more stable and
efficient version of solrcloud ?

I have a website which receives many search requests. It serve normally
about 2000 concurrent requests, but sometime there are peak from 4000 to
1 requests in few seconds.

On January I'll have a chance to upgrade my old SolrCloud 4.8.1 cluster to
a new brand version, but following this thread I read about the problems
that can occur upgrading to latest version.

I have seen that issue SOLR-7730 "speed-up faceting on doc values fields"
is fixed in 5.4.

I'm using standard faceting without docValues. Should I add docValues in
order to benefit of such fix?

Best regards,
Vincenzo



On Thu, Oct 8, 2015 at 2:22 PM, Mikhail Khludnev  wrote:

> Uwe, it's good to know! I mean that you've recovered. Take care!
>
> On Thu, Oct 8, 2015 at 1:24 PM, Uwe Reh 
> wrote:
>
> > Sorry for the delay. I had an ugly flu.
> >
> > SOLR-7730 seems to work fine. Using docValues with Solr
> > 5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again.
> > (90ms vs. 2ms) :-)
> >
> > Thanks
> > Uwe
> >
> >
> >
> >
> > Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev:
> >
> >> On Sun, Sep 27, 2015 at 2:00 PM, Uwe Reh 
> >> wrote:
> >>
> >> When 5.4 with SOLR-7730 will be released, I will start to use docValues.
> >>> Going this way, seems more straight forward to me.
> >>>
> >>
> >>
> >> Sure. Giving your answers docValues facets has a really good chance to
> >> perform in your index after SOLR-7730. It's really interesting to see
> >> performance numbers on early 5.4 builds:
> >>
> >>
> https://builds.apache.org/view/All/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
> >>
> >>
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: pf2 pf3 and stopwords

2015-12-16 Thread Binoy Dalal
What is your exact use case?

On Wed, 16 Dec 2015, 13:40 elisabeth benoit 
wrote:

> Thanks for your answer.
>
> Actually, using a slop of 1 is something I can't do (because of other
> specifications)
>
> I guess I'll index differently.
>
> Best regards,
> Elisabeth
>
> 2015-12-14 16:24 GMT+01:00 Binoy Dalal :
>
> > Moreover, the stopword de will work on your queries and not on your
> > documents, meaning if you query 'Gare de Saint Lazare', the terms
> actually
> > searched for will be Gare Saint and Lazare, 'de' will be filtered out.
> >
> > On Mon, Dec 14, 2015 at 8:49 PM Binoy Dalal 
> > wrote:
> >
> > > This isn't a bug. During pf3 matching, since your query has only three
> > > tokens, the entire query will be treated as a single phrase, and with
> > slop
> > > = 0, any word that comes in the middle of your query  - 'de' in this
> case
> > > will cause the phrase to not be matched. If you want to get around
> this,
> > > try setting your slop = 1 in which case it should match Gare Saint
> Lazare
> > > even with the de in it.
> > >
> > > On Mon, Dec 14, 2015 at 7:22 PM elisabeth benoit <
> > > elisaelisael...@gmail.com> wrote:
> > >
> > >> Hello,
> > >>
> > >> I am using solr 4.10.1. I have a field with stopwords
> > >>
> > >>
> > >>  > >> words="stopwords.txt"
> > >> enablePositionIncrements="true"/>
> > >>
> > >> And I use pf2 pf3 on that field with a slop of 0.
> > >>
> > >> If the request is "Gare Saint Lazare", and I have a document "Gare de
> > >> Saint
> > >> Lazare", "de" being a stopword, this document doesn't get the pf3
> boost,
> > >> because of "de".
> > >>
> > >> I was wondering, is this normal? is this a bug? is something wrong
> with
> > my
> > >> configuration?
> > >>
> > >> Best regards,
> > >> Elisabeth
> > >>
> > > --
> > > Regards,
> > > Binoy Dalal
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
>
-- 
Regards,
Binoy Dalal


Re: SolrCloud 4.8.1 - commit wait

2015-12-16 Thread Vincenzo D'Amore
Hi,

an update. Hope you can help me.

I have stopped all the other working collections, in order to have a clean
log file.

at 11:01:16 an hard commit has been issued

2015-12-16 11:01:49,839 [http-bio-8080-exec-824] INFO
 org.apache.solr.update.UpdateHandler - start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

at 11:11:31,344 the commit has been completed.

The commit was ended logging this line, I suppose 615021 is the wait time
(roughly 10 minutes) :

2015-12-16 11:11:31,343 [http-bio-8080-exec-991] INFO
 o.a.s.u.processor.LogUpdateProcessor - [catalogo_shard2_replica3]
webapp=/solr path=/update
params={waitSearcher=true=true=false=javabin=2}
{commit=} 0 615021

During this 10 minutes, the server logged "only" thes lines, looking at
them I don't see anything of strange:

2015-12-16 11:01:50,705 [http-bio-8080-exec-824] INFO
 o.a.solr.search.SolrIndexSearcher - Opening
Searcher@6d5c31e2[catalogo_shard1_replica2]
main
2015-12-16 11:01:50,724 [http-bio-8080-exec-824] INFO
 org.apache.solr.update.UpdateHandler - end_commit_flush
2015-12-16 11:02:20,722 [searcherExecutor-108-thread-1] INFO
 o.a.solr.spelling.suggest.Suggester - build()
2015-12-16 11:02:21,846 [http-bio-8080-exec-824] INFO
 o.a.s.u.processor.LogUpdateProcessor - [catalogo_shard1_replica2]
webapp=/solr path=/update
params={update.distrib=FROMLEADER=true=true=true=false=
http://192.168.101.118:8080/solr/catalogo_shard2_replica3/_end_point=true=javabin=2=false}
{commit=} 0 32007
2015-12-16 11:05:47,162 [http-bio-8080-exec-1037] INFO
 org.apache.solr.update.UpdateHandler - start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2015-12-16 11:05:47,970 [http-bio-8080-exec-1037] INFO
 o.a.solr.search.SolrIndexSearcher - Opening
Searcher@4ede7ac5[catalogo_shard2_replica3]
main
2015-12-16 11:05:47,989 [http-bio-8080-exec-1037] INFO
 org.apache.solr.update.UpdateHandler - end_commit_flush
2015-12-16 11:06:03,063 [commitScheduler-115-thread-1] INFO
 org.apache.solr.update.UpdateHandler - start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2015-12-16 11:06:03,896 [commitScheduler-115-thread-1] INFO
 o.a.solr.search.SolrIndexSearcher - Opening
Searcher@2bf4fd3a[catalogo_shard3_replica1]
realtime
2015-12-16 11:06:03,913 [commitScheduler-115-thread-1] INFO
 org.apache.solr.update.UpdateHandler - end_commit_flush
2015-12-16 11:06:19,435 [searcherExecutor-111-thread-1] INFO
 o.a.solr.spelling.suggest.Suggester - build()
2015-12-16 11:06:20,589 [http-bio-8080-exec-1037] INFO
 o.a.s.u.processor.LogUpdateProcessor - [catalogo_shard2_replica3]
webapp=/solr path=/update
params={update.distrib=FROMLEADER=true=true=true=false=
http://192.168.101.118:8080/solr/catalogo_shard2_replica3/_end_point=true=javabin=2=false}
{commit=} 0 33427
2015-12-16 11:08:07,076 [http-bio-8080-exec-1037] INFO
 org.apache.solr.update.UpdateHandler - start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2015-12-16 11:08:07,076 [http-bio-8080-exec-1037] INFO
 org.apache.solr.update.UpdateHandler - No uncommitted changes. Skipping
IW.commit.
2015-12-16 11:08:07,076 [http-bio-8080-exec-1037] INFO
 o.a.solr.search.SolrIndexSearcher - Opening
Searcher@75b2727f[catalogo_shard3_replica1]
main
2015-12-16 11:08:07,084 [http-bio-8080-exec-1037] INFO
 org.apache.solr.update.UpdateHandler - end_commit_flush
2015-12-16 11:08:39,040 [searcherExecutor-114-thread-1] INFO
 o.a.solr.spelling.suggest.Suggester - build()
2015-12-16 11:08:40,286 [http-bio-8080-exec-1037] INFO
 o.a.s.u.processor.LogUpdateProcessor - [catalogo_shard3_replica1]
webapp=/solr path=/update
params={update.distrib=FROMLEADER=true=true=true=false=
http://192.168.101.118:8080/solr/catalogo_shard2_replica3/_end_point=true=javabin=2=false}
{commit=} 0 33211

Could some component be the cause of this wait? Something like a suggester
or a spellchecker cache?
But if yes, I should see the activity in log file, isn't it?

Best regards,
Vincenzo


On Sat, Dec 12, 2015 at 7:50 PM, Erick Erickson 
wrote:

> Autowarm times will only happen when the commit has openSearcher=true
> or on a soft commit. But maybe your log levels aren't at INFO for the right
> code...
>
> That said, your autowarm counts at 0 probably means that you're not seeing
> any autowarming really, so that might be a red herring. Your newSearcher
> event in solrconfig.xml will still be fired, but may be commented out.
>
> This is still something of a puzzle. With an index this size, your hard
> commits should never take more than a second or two unless you're
> in some very strange state. Stack traces would be in order if lengthening
> the commit interval doesn't work.
>
> Best,
> Erick
>
> On Fri, Dec 11, 2015 at 5:58 PM, Vincenzo D'Amore 
> wrote:
> >