Re: query parsing

2015-09-24 Thread Upayavira
typically, the index dir is inside the data dir. Delete the index dir
and you should be good. If there is a tlog next to it, you might want to
delete that also.

If you dont have a data dir, i wonder whether you set the data dir when
creating your core or collection. Typically the instance dir and data
dir aren't needed.

Upayavira

On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
> OK, this is bizarre. You'd have had to set up SolrCloud by specifying the
> -zkRun command when you start Solr or the -zkHost; highly unlikely. On
> the
> admin page there would be a "cloud" link on the left side, I really doubt
> one's there.
> 
> You should have a data directory, it should be the parent of the index
> and
> tlog directories. As of sanity check try looking at the analysis page.
> Type
> a bunch of words in the left hand side indexing box and uncheck the
> verbose
> box. As you can tell I'm grasping at straws. I'm still puzzled why you
> don't have a "data" directory here, but that shouldn't really matter. How
> did you create this index? I don't mean data import handler more how did
> you create the core that you're indexing to?
> 
> Best,
> Erick
> 
> On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers 
> wrote:
> 
> > On 9/23/2015 12:30 PM, Erick Erickson wrote:
> >
> >> Then my next guess is you're not pointing at the index you think you are
> >> when you 'rm -rf data'
> >>
> >> Just ignore the Elall field for now I should think, although get rid of it
> >> if you don't think you need it.
> >>
> >> DIH should be irrelevant here.
> >>
> >> So let's back up.
> >> 1> go ahead and "rm -fr data" (with Solr stopped).
> >>
> > I have no "data" dir.  Did you mean "index" dir?  I removed 3 index
> > directories (2 for spelling):
> > cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
> >
> >> 2> start Solr
> >> 3> do NOT re-index.
> >> 4> look at your index via the schema-browser. Of course there should be
> >> nothing there!
> >>
> > Correct!  It said "there is no term info :("
> >
> >> 5> now kick off the DIH job and look again.
> >>
> > Now it shows a histogram, but most of the "terms" are long -- the full
> > texts of (the table.column) eventlogtext.logtext, including the whitespace
> > (with %0A used for newline characters)...  So, it appears it is not being
> > tokenized properly, correct?
> >
> >> Your logtext field should have only single tokens. The fact that you have
> >> some very
> >> long tokens presumably with whitespace) indicates that you aren't really
> >> blowing
> >> the index away between indexing.
> >>
> > Well, I did this time for sure.  I verified that initially, because it
> > showed there was no term info until I DIH'd again.
> >
> >> Are you perhaps in Solr Cloud with more than one replica?
> >>
> > Not that I know of, but being new to Solr, there could be things going on
> > that I'm not aware of.  How can I tell?  I certainly didn't set anything up
> > for solrCloud deliberately.
> >
> >> In that case you
> >> might be getting the index replicated on startup assuming you didn't
> >> blow away all replicas. If you are in SolrCloud, I'd just delete the
> >> collection and
> >> start over, after insuring that you'd pushed the configset up to
> >> Zookeeper.
> >>
> >> BTW, I always look at the schema.xml file from the Solr admin window just
> >> as
> >> a sanity check in these situations.
> >>
> > Good idea!  But the one shown in the browser is identical to the one I've
> > been editing!  So that's not an issue.
> >
> >


Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread Uwe Reh

Am 23.09.2015 um 10:02 schrieb Mikhail Khludnev:

...
Accelerating non-DV facets is not so clear so far. Please show profiler
snapshot for non-DV facets if you wish to go this way.


Hi,

attached is a visualvm profile to several times a simplified query (just 
one facet):

http://xyz/solr/hebis/select/?q=*:*=true=1=30=author_facet=true


The avarage "QTime" for the query is ~5 Seconds:


  5254.0
  0.0
  5253.0
  0.0
  0.0
  0.0
  0.0



The profile was made with Solr 5.3 running an 4.10 index with no 
'docValue' at all in the schema. (A native 5.3 index with docValues is 
still building)


For me it's surprising, that a lot of "docValue" could be found in the 
profile.


Uwe

PS.
Meanwhile I tried a 5.1 and I got in the same behavior.
"Hot Spots - Method";"Self Time [%]";"Self Time";"Self Time (CPU)";"Total 
Time";"Total Time (CPU)";"Samples"
"sun.nio.ch.ServerSocketChannelImpl.accept()";"29.757507";"911411.696 
ms";"227852.922 ms";"911411.696 ms";"227852.922 ms";"4"
"sun.nio.ch.SelectorImpl.select()";"29.751842";"911238.171 ms";"911238.171 
ms";"911238.171 ms";"911238.171 ms";"6"
"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos()";"12.056136";"369254.84
 ms";"0.0 ms";"369254.84 ms";"0.0 ms";"73"
"java.lang.Object.wait()";"7.439377";"227852.924 ms";"0.0 ms";"227852.924 
ms";"0.0 ms";"1"
"java.net.ServerSocket.accept()";"7.439377";"227852.924 ms";"0.0 
ms";"227852.924 ms";"0.0 ms";"1"
"java.util.HashMap.put()";"3.8150647";"116847.636 ms";"116847.636 
ms";"116847.636 ms";"116847.636 ms";"873"
"java.util.TreeMap.put()";"2.946289";"90238.818 ms";"90238.818 ms";"90238.818 
ms";"90238.818 ms";"180"
"org.apache.lucene.index.FieldInfos$Builder.addOrUpdateInternal()";"2.1034875";"64425.528
 ms";"64425.528 ms";"183450.033 ms";"183450.033 ms";"113"
"java.util.Collections$UnmodifiableCollection$1.next()";"0.8864094";"27148.909 
ms";"27148.909 ms";"27148.909 ms";"27148.909 ms";"41"
"java.util.TreeMap$EntryIterator.next()";"0.81940365";"25096.661 ms";"25096.661 
ms";"25096.661 ms";"25096.661 ms";"26"
"java.util.HashMap.get()";"0.66768044";"20449.689 ms";"20449.689 ms";"20449.689 
ms";"20449.689 ms";"159"
"org.apache.solr.request.DocValuesFacets.accumMultiSeg()";"0.42119572";"12900.365
 ms";"12900.365 ms";"32423.444 ms";"32423.444 ms";"23"
"org.apache.lucene.util.packed.MonotonicLongValues.get()";"0.37381834";"11449.292
 ms";"11449.292 ms";"11449.292 ms";"11449.292 ms";"73"
"java.util.AbstractCollection.toArray()";"0.3550354";"10874.009 ms";"10874.009 
ms";"10874.009 ms";"10874.009 ms";"63"
"org.apache.lucene.index.FieldInfos.()";"0.319384";"9782.08 ms";"9782.08 
ms";"150232.207 ms";"150232.207 ms";"69"
"org.apache.lucene.uninverting.DocTermOrds$Iterator.read()";"0.26374063";"8077.837
 ms";"8077.837 ms";"8077.837 ms";"8077.837 ms";"64"
"java.util.Collections.max()";"0.21143816";"6475.919 ms";"6475.919 
ms";"6475.919 ms";"6475.919 ms";"46"
"org.apache.solr.request.DocValuesFacets.getCounts()";"0.090463296";"2770.706 
ms";"2770.706 ms";"410211.805 ms";"410211.805 ms";"60"
"org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll()";"0.05521328";"1691.07
 ms";"1691.07 ms";"1691.07 ms";"1691.07 ms";"2"
"java.lang.System.identityHashCode[native]()";"0.031022375";"950.152 
ms";"950.152 ms";"950.152 ms";"950.152 ms";"3"
"org.apache.solr.util.LongPriorityQueue.downHeap()";"0.026107715";"799.626 
ms";"799.626 ms";"799.626 ms";"799.626 ms";"9"
"java.util.Collections$UnmodifiableCollection$1.hasNext()";"0.020632554";"631.933
 ms";"631.933 ms";"631.933 ms";"631.933 ms";"6"
"org.apache.lucene.index.FieldInfo.()";"0.011944577";"365.838 
ms";"365.838 ms";"365.838 ms";"365.838 ms";"4"
"java.util.WeakHashMap.put()";"0.011552288";"353.823 ms";"353.823 ms";"353.823 
ms";"353.823 ms";"2"
"org.apache.lucene.index.FieldInfos$Builder.add()";"0.010934878";"334.913 
ms";"334.913 ms";"211565.8 ms";"211565.8 ms";"181"
"org.eclipse.jetty.server.HttpOutput.write()";"0.010440102";"319.759 
ms";"319.759 ms";"482.602 ms";"482.602 ms";"9"
"java.util.WeakHashMap.get()";"0.010077655";"308.658 ms";"308.658 ms";"308.658 
ms";"308.658 ms";"2"
"org.apache.lucene.util.LongValues.get()";"0.010070211";"308.43 ms";"308.43 
ms";"11757.722 ms";"11757.722 ms";"74"
"org.apache.lucene.util.fst.FST.findTargetArc()";"0.00995512";"304.905 
ms";"304.905 ms";"304.905 ms";"304.905 ms";"2"
"org.apache.lucene.uninverting.DocTermOrds.uninvert()";"0.008576673";"262.686 
ms";"262.686 ms";"262.686 ms";"262.686 ms";"1"
"org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter()";"0.0037003446";"113.334
 ms";"113.334 ms";"341247.51 ms";"341247.51 ms";"58"
"java.lang.String.getBytes()";"0.0032916984";"100.818 ms";"100.818 ms";"100.818 
ms";"100.818 ms";"1"
"java.nio.DirectByteBuffer.get()";"0.0032914046";"100.809 ms";"100.809 
ms";"100.809 ms";"100.809 ms";"1"
"org.eclipse.jetty.http.DateGenerator.doFormatDate()";"0.003270182";"100.159 
ms";"100.159 ms";"100.159 ms";"100.159 ms";"1"

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread Mikhail Khludnev
Uwe
Unfortunately fieldValueCache was dropped there
https://github.com/apache/lucene-solr/commit/fca4c22da81447867533fb28c0f06150cdc2eb9d#diff-5ac9dc7b128b4dd99b764060759222b2R428
However, I see that it's still available in new JSON facets (thus, you need
to amend your app).
Otherwise, you can postpone migration till 5.4. or apply and measure DV
facet fix from SOLR-7730 .

On Thu, Sep 24, 2015 at 12:38 PM, Uwe Reh 
wrote:

> Am 22.09.2015 um 18:10 schrieb Walter Underwood:
>
>> Faceting on an author field is almost always a bad idea. Or at least a
>> slow, expensive idea.
>>
>
> Hi Wunder,
> n a technical context, the 'author'-facet may be suboptimal. In our
> businesses (library services) it's a core feature.
> Yes the facet is expensive, but thanks to the fieldValueCache (4.10)
> sufficiently fast.
>
> uwe
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





RE: Solr Join between two indexes taking too long.

2015-09-24 Thread Russell Taylor
Hi Mikhail,
 The initial join query is slow-ish but okay. It's the paging through the 
results which is very fast which is what we need and 5.3 gives us that. In 4.10 
is was still slow. 

So yes we are happy with the query time.

The longValue is not multi valued, I tried multipleValuesPerDocument=false but 
it didn't make a difference. I only created this field as I read that longs 
perform better when used as the join field.

Thanks

Russ. 
 

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: 22 September 2015 10:50
To: solr-user
Subject: Re: Solr Join between two indexes taking too long.

Russ,
Do you mean you accelerate only sidecase with small cardinality, or your 
problem is resolved in general and 2.3 sec is fine for you?

Regarding longValues. Is it multivalued? It might work if {!join} parser pass

multipleValuesPerDocument=false into

http://lucene.apache.org/core/5_2_1/join/org/apache/lucene/search/join/JoinUtil.html#createJoinQuery%28java.lang.String,%20boolean,%20java.lang.String,%20org.apache.lucene.search.Query,%20org.apache.lucene.search.IndexSearcher,%20org.apache.lucene.search.join.ScoreMode%29


On Tue, Sep 22, 2015 at 11:30 AM, Russell Taylor < 
russell.tay...@interactivedata.com> wrote:

> Hi,
>  I've just set-up solr 5.3 and moved the two indexes into it.
>
> I've tried the join below
> {!join FROM=stringValue TO=stringValue fromIndex=indexB 
> score=none}universe:uniA"
> and the qTime was 2.3 seconds
> "QTime": 2326
>
> Which is much much better than before, I also tried the longValues 
> {!join FROM=longValue TO=longValue fromIndex=indexB 
> score=none}universe:uniA"
> But now get this error
> java.lang.IllegalStateException: unexpected docvalues type NUMERIC for 
> field 'longValue' (expected one of [SORTED, SORTED_SET]). Use 
> UninvertingReader or index with docvalues.
> The longValues were a conversion of the strings so unless someone 
> suggests that the performance would be much better using longs I’ll 
> stick with the strings.
>
>
> Thanks for your help Mikhail and Upayavira now I just need to get the 
> firm to move to 5.3 ☺
>
> Russ.
>
>
> -Original Message-
> From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
> Sent: 14 September 2015 15:54
> To: solr-user
> Subject: Re: Solr Join between two indexes taking too long.
>
> Why? It's enough to just open index by Solr 5.3 instance. No need to 
> reindex.
>
> On Mon, Sep 14, 2015 at 4:57 PM, Russell Taylor <
> russell.tay...@interactivedata.com russell.tay...@interactivedata.com>> wrote:
>
> > Looks like I won't be able to test this out on 5.3.
> >
> > Thanks for all your help.
> >
> > Russ.
> >
> > -Original Message-
> > From: Russell Taylor
> > Sent: 11 September 2015 14:00
> > To: solr-user@lucene.apache.org
> > Subject: RE: Solr Join between two indexes taking too long.
> >
> > It will take a little while to set-up a 5.3 version, hopefully I'll 
> > have some results later next week.
> > 
> > From: Mikhail Khludnev [mkhlud...@griddynamics.com]
> > Sent: 11 September 2015 12:59
> > To: Russell Taylor
> > Subject: Re: Solr Join between two indexes taking too long.
> >
> >
> > On Wed, Sep 9, 2015 at 1:10 PM, Russell Taylor <
> > russell.tay...@interactivedata.com > russell.tay...@interactivedata.com russell.tay...@interactivedata.com>>> wrote:
> > Do you have a link to your talk at Berlin Buzzwords?
> >
> >
> > https://berlinbuzzwords.de/file/bbuzz-2015-mikhailv-khludnev-approac
> > hi
> > ng-join-index-lucene
> >
> > How did it go with Solr 5.3 and {!join score=...} ?
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
> >
> >
> > ***
> > This message (including any files transmitted with it) may contain 
> > confidential and/or proprietary information, is the property of 
> > Interactive Data Corporation and/or its subsidiaries, and is 
> > directed only to the addressee(s). If you are not the designated 
> > recipient or have reason to believe you received this message in 
> > error, please delete this message from your system and notify the 
> > sender immediately. An unintended recipient's disclosure, copying, 
> > distribution, or use of this message or any attachments is 
> > prohibited
> and may be unlawful.
> > ***
> >
> >
> > ***
> > This message (including any files transmitted with it) may contain 
> > confidential and/or proprietary information, is the property of 
> > Interactive Data Corporation and/or its subsidiaries, and is 
> > directed only to the addressee(s). If you are not the designated 
> > recipient or have reason to believe you received this 

Re: Solr Log Analysis

2015-09-24 Thread Ahmet Arslan
Hi Tarala,

Never used my self but please see:

https://soleami.com/blog/soleami-start_en.html

Ahmet



On Thursday, September 24, 2015 3:16 AM, "Tarala, Magesh"  
wrote:
I'm using Solr 4.10.4 in a 3 node cloud setup. I have 3 shards and 3 replicas 
for the collection.

I want to analyze the logs to extract the queries and query times. Is there a 
tool or script someone has created already for this?

Thanks,
Magesh 


Re: query parsing

2015-09-24 Thread Alessandro Benedetti
I would focus on this :

"

> 5> now kick off the DIH job and look again.
>
Now it shows a histogram, but most of the "terms" are long -- the full
texts of (the table.column) eventlogtext.logtext, including the whitespace
(with %0A used for newline characters)...  So, it appears it is not being
tokenized properly, correct?"
Can you open from your Solr ui , the schema xml and show us the snippets
for that field that seems to not tokenise ?
Can you show us ( even a screenshot is fine) the schema browser page
related ?
Could be a problem of encoding ?
Following Erick details about the analysis, what are your results ?

Cheers

2015-09-24 8:04 GMT+01:00 Upayavira :

> typically, the index dir is inside the data dir. Delete the index dir
> and you should be good. If there is a tlog next to it, you might want to
> delete that also.
>
> If you dont have a data dir, i wonder whether you set the data dir when
> creating your core or collection. Typically the instance dir and data
> dir aren't needed.
>
> Upayavira
>
> On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
> > OK, this is bizarre. You'd have had to set up SolrCloud by specifying the
> > -zkRun command when you start Solr or the -zkHost; highly unlikely. On
> > the
> > admin page there would be a "cloud" link on the left side, I really doubt
> > one's there.
> >
> > You should have a data directory, it should be the parent of the index
> > and
> > tlog directories. As of sanity check try looking at the analysis page.
> > Type
> > a bunch of words in the left hand side indexing box and uncheck the
> > verbose
> > box. As you can tell I'm grasping at straws. I'm still puzzled why you
> > don't have a "data" directory here, but that shouldn't really matter. How
> > did you create this index? I don't mean data import handler more how did
> > you create the core that you're indexing to?
> >
> > Best,
> > Erick
> >
> > On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers 
> > wrote:
> >
> > > On 9/23/2015 12:30 PM, Erick Erickson wrote:
> > >
> > >> Then my next guess is you're not pointing at the index you think you
> are
> > >> when you 'rm -rf data'
> > >>
> > >> Just ignore the Elall field for now I should think, although get rid
> of it
> > >> if you don't think you need it.
> > >>
> > >> DIH should be irrelevant here.
> > >>
> > >> So let's back up.
> > >> 1> go ahead and "rm -fr data" (with Solr stopped).
> > >>
> > > I have no "data" dir.  Did you mean "index" dir?  I removed 3 index
> > > directories (2 for spelling):
> > > cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
> > >
> > >> 2> start Solr
> > >> 3> do NOT re-index.
> > >> 4> look at your index via the schema-browser. Of course there should
> be
> > >> nothing there!
> > >>
> > > Correct!  It said "there is no term info :("
> > >
> > >> 5> now kick off the DIH job and look again.
> > >>
> > > Now it shows a histogram, but most of the "terms" are long -- the full
> > > texts of (the table.column) eventlogtext.logtext, including the
> whitespace
> > > (with %0A used for newline characters)...  So, it appears it is not
> being
> > > tokenized properly, correct?
> > >
> > >> Your logtext field should have only single tokens. The fact that you
> have
> > >> some very
> > >> long tokens presumably with whitespace) indicates that you aren't
> really
> > >> blowing
> > >> the index away between indexing.
> > >>
> > > Well, I did this time for sure.  I verified that initially, because it
> > > showed there was no term info until I DIH'd again.
> > >
> > >> Are you perhaps in Solr Cloud with more than one replica?
> > >>
> > > Not that I know of, but being new to Solr, there could be things going
> on
> > > that I'm not aware of.  How can I tell?  I certainly didn't set
> anything up
> > > for solrCloud deliberately.
> > >
> > >> In that case you
> > >> might be getting the index replicated on startup assuming you didn't
> > >> blow away all replicas. If you are in SolrCloud, I'd just delete the
> > >> collection and
> > >> start over, after insuring that you'd pushed the configset up to
> > >> Zookeeper.
> > >>
> > >> BTW, I always look at the schema.xml file from the Solr admin window
> just
> > >> as
> > >> a sanity check in these situations.
> > >>
> > > Good idea!  But the one shown in the browser is identical to the one
> I've
> > > been editing!  So that's not an issue.
> > >
> > >
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Weird Exception

2015-09-24 Thread Upayavira
What were you trying to do when this happened?

Bear in mind that a tdate field *is* by definition multivalued. It is
indexed at multiple levels of precision.

I bet if you reindexed with this field as a date field type, you won't
hit this issue. The date field type is still a TrieDateField, but it has
a precision of 0, meaning it is only indexed once.

Upayavira

On Thu, Sep 24, 2015, at 03:00 AM, Ravi Solr wrote:
> Recently I installed 5.3.0 and started seeing weird exception which
> baffled
> me. Has anybody encountered such an issue ? The indexing was done via
> DIH,
> the field that is causing the issue is a TrieDateField defined as below
> 
> 
>  precisionStep="6" positionIncrementGap="0"/>
> 
> Looking at the following exceptions it feels like a wrong exception,
> ity just doesnt jive well with the field definitions
> 
> 
> 2015-09-24 01:43:33.667 ERROR (qtp1256054824-13) [c:collection1
> s:shard1 r:core_node2 x:collection1_shard1_replica4] o.a.s.c.SolrCore
> java.lang.IllegalStateException: Type mismatch: pubdatetime was
> indexed with multiple values per document, use SORTED_SET instead
>   at 
> org.apache.lucene.uninverting.FieldCacheImpl$SortedDocValuesCache.createValue(FieldCacheImpl.java:679)
>   at 
> org.apache.lucene.uninverting.FieldCacheImpl$Cache.get(FieldCacheImpl.java:190)
>   at 
> org.apache.lucene.uninverting.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:647)
>   at 
> org.apache.lucene.uninverting.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:627)
>   at 
> org.apache.lucene.uninverting.UninvertingReader.getSortedDocValues(UninvertingReader.java:257)
>   at 
> org.apache.lucene.index.MultiDocValues.getSortedValues(MultiDocValues.java:316)
>   at 
> org.apache.lucene.index.SlowCompositeReaderWrapper.getSortedDocValues(SlowCompositeReaderWrapper.java:125)
>   at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:304)
>   at 
> org.apache.solr.search.function.OrdFieldSource.getValues(OrdFieldSource.java:99)
>   at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.(FunctionQuery.java:116)
>   at 
> org.apache.lucene.queries.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:93)
>   at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:274)
>   at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135)
>   at 
> org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:256)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:769)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)
>   at 
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:200)
>   at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1682)
>   at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1501)
>   at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555)
>   at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at 

Re: Taking Solr to production with docker

2015-09-24 Thread Ugo Matrangolo
Hi,

still don't get it :)

With Solr 5 it auto-installs itself as a supervised service and works
really nice in an AWS CloudFormation template.

Best
Ugo


On Wed, Sep 23, 2015 at 10:01 PM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> we get to run commands like, docker run solr and have solr working!
>
> containers make new application deployments a breeze.
>
> On Wed, Sep 23, 2015 at 4:35 PM, Ugo Matrangolo 
> wrote:
>
> > Hi,
> >
> > just curious: what you get by running Solr into a Docker container ?
> >
> > Best
> > Ugo
> >
> > On Wed, Sep 23, 2015 at 5:39 PM, Vincenzo D'Amore 
> > wrote:
> >
> > > Hi Doug,
> > >
> > > I have ported solrcloud to docker too, I hope you can found something
> > > interesting here:
> > >
> > > https://github.com/freedev/solrcloud-zookeeper-docker
> > >
> > > This project runs an zookeeper ensemble and a sorlcloud cluster within
> > many
> > > containers.
> > >
> > > Now, in my spare time, I'm trying to port this project to kubernetes, I
> > > would like to split all these containers to many nodes.
> > >
> > > Cheers,
> > > Vincenzo
> > >
> > >
> > > On Wed, Sep 23, 2015 at 6:15 PM, Christopher Bradford <
> > > cbradf...@opensourceconnections.com> wrote:
> > >
> > > > Hi Doug,
> > > >
> > > > The Dockerfiles we use have been pushed up to a GitHub repo
> > > > https://github.com/o19s/solr-docker. I'm happy to answer any
> questions
> > > > about them.
> > > >
> > > > ~Chris
> > > >
> > > > On Wed, Sep 23, 2015 at 8:47 AM Doug Turnbull <
> > > > dturnb...@opensourceconnections.com> wrote:
> > > >
> > > > > Our test Solr and Elasticsearch instances for Quepid(
> > http://quepid.com
> > > )
> > > > are
> > > > > now hosted on docker (specifically kubernetes)
> > > > >
> > > > > It's worked pretty well. I'd suggest if you're curious to speak to
> my
> > > > > devops focussed colleague Chris Bradford that has a great deal of
> > > > > experience here. I haven't encountered any issues that would lead
> me
> > to
> > > > > describe it as "not ready for production"
> > > > >
> > > > > Doug
> > > > >
> > > > >
> > > > > On Wednesday, September 23, 2015, Upayavira 
> wrote:
> > > > >
> > > > >>
> > > > >>
> > > > >> On Wed, Sep 23, 2015, at 02:00 PM,
> aurelien.mazo...@francelabs.com
> > > > >> wrote:
> > > > >> > Hi Solr community,
> > > > >> >
> > > > >> > I can find many blog posts on how to deploy Solr with docker
> but I
> > > am
> > > > >> > wondering if Solr/Docker is really ready for production.
> > > > >> > Has anybody ever ran Solr in production with Docker?
> > > > >>
> > > > >> Hi Aurelien,
> > > > >>
> > > > >> I'm wondering if there's anything specific that is needed to run
> > Solr
> > > > >> inside Docker? Is there something you have in mind?
> > > > >>
> > > > >> Upayavira
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource
> > > Connections
> > > > > , LLC | 240.476.9983
> > > > > Author: Relevant Search 
> > > > > This e-mail and all contents, including attachments, is considered
> to
> > > be
> > > > > Company Confidential unless explicitly stated otherwise, regardless
> > > > > of whether attachments are marked as such.
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Vincenzo D'Amore
> > > email: v.dam...@gmail.com
> > > skype: free.dev
> > > mobile: +39 349 8513251
> > >
> >
>


Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread Uwe Reh

Am 22.09.2015 um 18:10 schrieb Walter Underwood:

Faceting on an author field is almost always a bad idea. Or at least a slow, 
expensive idea.


Hi Wunder,
n a technical context, the 'author'-facet may be suboptimal. In our 
businesses (library services) it's a core feature.
Yes the facet is expensive, but thanks to the fieldValueCache (4.10) 
sufficiently fast.


uwe



Help on autocomplete / suggester

2015-09-24 Thread Andrea Gazzarini

Hi guys,
as part of a customer requirement, I need to provide an autocomplete / 
suggester feature. For that reason I started looking at the Suggester 
Component.


The target Solr version is not yet determined: I mean, there's another 
project in production, of the same customer, which is using Solr 4.7.1 
(no SolrCloud, just a master with two slaves) so I guess they will 
extend those instances with additional cores, but I'm not sure about 
that, maybe they would like to migrate towards a new version  / new 
architecture.


Anyway, after reading some info [1]  [2]  [3] about the Suggester, and 
after trying a bit with some sample data, I'm not sure if that fits my 
needs, because the proposed suggestions must follow these criteria:


 * suffix search: Vi = *Vi*terbo, *Vi*cenza, *Vi*llanova (max priority)
 * infix search: Vi = A*vi*gliano, Tar*vi*sio (medium priority)
 * fuzzy (phonetic?) search: Vitr= Viterbo, Vitorchiano (lowest
   priority, this requirement could be even removed)

 * everything could be constrained by one or more filter queries
 * each suggestion could contain (depending on the use case) up to five
   additional attributes (other than the suggestion itself), so the
   payload provided by the Suggester couldn't be enough (or it would
   require a custom encoding of such data in that field)
 * in a couple of scenarios, the search needs to be executed on several
   fields, with different boosts (e.g. description, address, code) and
   the corresponding suggestions come from another field (e.g. name)
 * I don't have any incremental / delta indexing issue, the whole
   dataset is not huge, a couple of millions of database records, with
   a low grow rate, and I can recreate everything from scratch using
   the DIH

Do you think this is something for the built-in Suggester? Or is this 
something that it's better to implement with a RequestHandler with  
something like (e)dismax and ngramming?


Many thanks in advance
Andrea

[1] https://cwiki.apache.org/confluence/display/solr/Suggester
[2] http://lucidworks.com/blog/solr-suggester/
[3] http://alexbenedetti.blogspot.it/2015/07/solr-you-complete-me.html





Re: Is docValues required in Solr 5.x for distributed result grouping?

2015-09-24 Thread Alessandro Benedetti
I didn't know DocValues to be used apart from sorting and faceting (fc
algorithm) on fields.
Of course the Doc Values data structure can be used by anything who wants
to retrieve the column base view of documents per field, but is anywhere
documented all the ways it's used in Solr ?

Cheers

2015-09-24 2:33 GMT+01:00 Shawn Heisey :

> On 9/18/2015 3:27 PM, Shawn Heisey wrote:
> > A query that works fine in Solr 4.9.1 doesn't work in 5.2.1 with the
> > same schema.  The field that I am grouping on does not have docValues.
> > I get this exception:
> >
> > java.lang.IllegalStateException: unexpected docvalues type SORTED_SET
> > for field 'ip' (expected=SORTED). Use UninvertingReader or index with
> > docvalues.
>
> No response in five days ... I guess I'll go ahead and raise that issue.
>
> Thanks,
> Shawn
>
>
>


-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: query parsing

2015-09-24 Thread Duck Geraint (ext) GBJH
Okay, so maybe I'm missing something here (I'm still relatively new to Solr 
myself), but am I right in thinking the following is still in your 
solrconfig.xml file:

  
true
managed-schema
  

If so, wouldn't using a managed schema make several of your field definitions 
inside the schema.xml file semi-redundant?

Regards,
Geraint


Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com


-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
Sent: 24 September 2015 09:23
To: solr-user@lucene.apache.org
Subject: Re: query parsing

I would focus on this :

"

> 5> now kick off the DIH job and look again.
>
Now it shows a histogram, but most of the "terms" are long -- the full texts of 
(the table.column) eventlogtext.logtext, including the whitespace (with %0A 
used for newline characters)...  So, it appears it is not being tokenized 
properly, correct?"
Can you open from your Solr ui , the schema xml and show us the snippets for 
that field that seems to not tokenise ?
Can you show us ( even a screenshot is fine) the schema browser page related ?
Could be a problem of encoding ?
Following Erick details about the analysis, what are your results ?

Cheers

2015-09-24 8:04 GMT+01:00 Upayavira :

> typically, the index dir is inside the data dir. Delete the index dir
> and you should be good. If there is a tlog next to it, you might want
> to delete that also.
>
> If you dont have a data dir, i wonder whether you set the data dir
> when creating your core or collection. Typically the instance dir and
> data dir aren't needed.
>
> Upayavira
>
> On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
> > OK, this is bizarre. You'd have had to set up SolrCloud by
> > specifying the -zkRun command when you start Solr or the -zkHost;
> > highly unlikely. On the admin page there would be a "cloud" link on
> > the left side, I really doubt one's there.
> >
> > You should have a data directory, it should be the parent of the
> > index and tlog directories. As of sanity check try looking at the
> > analysis page.
> > Type
> > a bunch of words in the left hand side indexing box and uncheck the
> > verbose box. As you can tell I'm grasping at straws. I'm still
> > puzzled why you don't have a "data" directory here, but that
> > shouldn't really matter. How did you create this index? I don't mean
> > data import handler more how did you create the core that you're
> > indexing to?
> >
> > Best,
> > Erick
> >
> > On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers
> > 
> > wrote:
> >
> > > On 9/23/2015 12:30 PM, Erick Erickson wrote:
> > >
> > >> Then my next guess is you're not pointing at the index you think
> > >> you
> are
> > >> when you 'rm -rf data'
> > >>
> > >> Just ignore the Elall field for now I should think, although get
> > >> rid
> of it
> > >> if you don't think you need it.
> > >>
> > >> DIH should be irrelevant here.
> > >>
> > >> So let's back up.
> > >> 1> go ahead and "rm -fr data" (with Solr stopped).
> > >>
> > > I have no "data" dir.  Did you mean "index" dir?  I removed 3
> > > index directories (2 for spelling):
> > > cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
> > >
> > >> 2> start Solr
> > >> 3> do NOT re-index.
> > >> 4> look at your index via the schema-browser. Of course there
> > >> 4> should
> be
> > >> nothing there!
> > >>
> > > Correct!  It said "there is no term info :("
> > >
> > >> 5> now kick off the DIH job and look again.
> > >>
> > > Now it shows a histogram, but most of the "terms" are long -- the
> > > full texts of (the table.column) eventlogtext.logtext, including
> > > the
> whitespace
> > > (with %0A used for newline characters)...  So, it appears it is
> > > not
> being
> > > tokenized properly, correct?
> > >
> > >> Your logtext field should have only single tokens. The fact that
> > >> you
> have
> > >> some very
> > >> long tokens presumably with whitespace) indicates that you aren't
> really
> > >> blowing
> > >> the index away between indexing.
> > >>
> > > Well, I did this time for sure.  I verified that initially,
> > > because it showed there was no term info until I DIH'd again.
> > >
> > >> Are you perhaps in Solr Cloud with more than one replica?
> > >>
> > > Not that I know of, but being new to Solr, there could be things
> > > going
> on
> > > that I'm not aware of.  How can I tell?  I certainly didn't set
> anything up
> > > for solrCloud deliberately.
> > >
> > >> In that case you
> > >> might be getting the index replicated on startup assuming you
> > >> didn't blow away all replicas. If you are in SolrCloud, I'd just
> > >> delete the collection and start over, after insuring that you'd
> > >> pushed the configset up to Zookeeper.
> > >>
> > >> BTW, I always look at the schema.xml file from the Solr admin
> > >> window
> just
> > >> as
> > >> a sanity check in these situations.
> > >>
> > > Good idea!  But 

Re: Taking Solr to production with docker

2015-09-24 Thread Joe Lawson
I think this sums up "what is docker":
https://youtu.be/F44GtxHO2MI
On Sep 24, 2015 4:37 AM, "Ugo Matrangolo"  wrote:

> Hi,
>
> still don't get it :)
>
> With Solr 5 it auto-installs itself as a supervised service and works
> really nice in an AWS CloudFormation template.
>
> Best
> Ugo
>
>
> On Wed, Sep 23, 2015 at 10:01 PM, Joe Lawson <
> jlaw...@opensourceconnections.com> wrote:
>
> > we get to run commands like, docker run solr and have solr working!
> >
> > containers make new application deployments a breeze.
> >
> > On Wed, Sep 23, 2015 at 4:35 PM, Ugo Matrangolo <
> ugo.matrang...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > just curious: what you get by running Solr into a Docker container ?
> > >
> > > Best
> > > Ugo
> > >
> > > On Wed, Sep 23, 2015 at 5:39 PM, Vincenzo D'Amore 
> > > wrote:
> > >
> > > > Hi Doug,
> > > >
> > > > I have ported solrcloud to docker too, I hope you can found something
> > > > interesting here:
> > > >
> > > > https://github.com/freedev/solrcloud-zookeeper-docker
> > > >
> > > > This project runs an zookeeper ensemble and a sorlcloud cluster
> within
> > > many
> > > > containers.
> > > >
> > > > Now, in my spare time, I'm trying to port this project to
> kubernetes, I
> > > > would like to split all these containers to many nodes.
> > > >
> > > > Cheers,
> > > > Vincenzo
> > > >
> > > >
> > > > On Wed, Sep 23, 2015 at 6:15 PM, Christopher Bradford <
> > > > cbradf...@opensourceconnections.com> wrote:
> > > >
> > > > > Hi Doug,
> > > > >
> > > > > The Dockerfiles we use have been pushed up to a GitHub repo
> > > > > https://github.com/o19s/solr-docker. I'm happy to answer any
> > questions
> > > > > about them.
> > > > >
> > > > > ~Chris
> > > > >
> > > > > On Wed, Sep 23, 2015 at 8:47 AM Doug Turnbull <
> > > > > dturnb...@opensourceconnections.com> wrote:
> > > > >
> > > > > > Our test Solr and Elasticsearch instances for Quepid(
> > > http://quepid.com
> > > > )
> > > > > are
> > > > > > now hosted on docker (specifically kubernetes)
> > > > > >
> > > > > > It's worked pretty well. I'd suggest if you're curious to speak
> to
> > my
> > > > > > devops focussed colleague Chris Bradford that has a great deal of
> > > > > > experience here. I haven't encountered any issues that would lead
> > me
> > > to
> > > > > > describe it as "not ready for production"
> > > > > >
> > > > > > Doug
> > > > > >
> > > > > >
> > > > > > On Wednesday, September 23, 2015, Upayavira 
> > wrote:
> > > > > >
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Sep 23, 2015, at 02:00 PM,
> > aurelien.mazo...@francelabs.com
> > > > > >> wrote:
> > > > > >> > Hi Solr community,
> > > > > >> >
> > > > > >> > I can find many blog posts on how to deploy Solr with docker
> > but I
> > > > am
> > > > > >> > wondering if Solr/Docker is really ready for production.
> > > > > >> > Has anybody ever ran Solr in production with Docker?
> > > > > >>
> > > > > >> Hi Aurelien,
> > > > > >>
> > > > > >> I'm wondering if there's anything specific that is needed to run
> > > Solr
> > > > > >> inside Docker? Is there something you have in mind?
> > > > > >>
> > > > > >> Upayavira
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource
> > > > Connections
> > > > > > , LLC | 240.476.9983
> > > > > > Author: Relevant Search 
> > > > > > This e-mail and all contents, including attachments, is
> considered
> > to
> > > > be
> > > > > > Company Confidential unless explicitly stated otherwise,
> regardless
> > > > > > of whether attachments are marked as such.
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Vincenzo D'Amore
> > > > email: v.dam...@gmail.com
> > > > skype: free.dev
> > > > mobile: +39 349 8513251
> > > >
> > >
> >
>


Re: Dismax and StandardTokenizer: OR queries despite mm=100%

2015-09-24 Thread Andreas Hubold

Thank you, autoGeneratePhraseQueries did the job.

I assume that this setting just affects query generation and I don't 
need to reindex after changing the field type accordingly. Is this correct?


BTW, I just found SOLR-3589 where the same issue was reported and fixed 
for the edismax parser. It seems it was fixed for edismax but not for 
dismax.


Andreas

Ahmet Arslan wrote on 09/23/2015 09:25 PM:

Hi Andreas,

Thats weird. It looks like mm calculation is done before the tokenization took 
place.

You can try to set autoGeneratePhraseQueries to true
or replace dashes with white-spaces at client side.

Ahmet



On Wednesday, September 23, 2015 10:00 PM, Andreas Hubold 
 wrote:
Hi,

we're using Solr 4.10.4 and the dismax query parser to search across
multiple fields. One of the fields is configured with a
StandardTokenizer (type "text_general"). I set mm=100% to only get hits
that match all terms.

This does not seem to work for queries that are split into multiple
tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav",
"001") returns documents that only have "cc" in it. I need a result with
documents that contains all tokens - as returned by the /select handler.

Is there a way to force AND semantics for such dismax queries? I also
tried to set q.op=AND but it did not help.

The query is parsed as:

(+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) |
productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav
001")~0.1))/no_coord

Thanks in advance!

Regards,
Andreas





Re: Dismax and StandardTokenizer: OR queries despite mm=100%

2015-09-24 Thread Ahmet Arslan
Hi Andreas,

You are correct, no re-indexing required for autoGeneratePhraseQueries.

Ahmet



On Thursday, September 24, 2015 3:52 PM, Andreas Hubold 
 wrote:
Thank you, autoGeneratePhraseQueries did the job.

I assume that this setting just affects query generation and I don't 
need to reindex after changing the field type accordingly. Is this correct?

BTW, I just found SOLR-3589 where the same issue was reported and fixed 
for the edismax parser. It seems it was fixed for edismax but not for 
dismax.

Andreas


Ahmet Arslan wrote on 09/23/2015 09:25 PM:
> Hi Andreas,
>
> Thats weird. It looks like mm calculation is done before the tokenization 
> took place.
>
> You can try to set autoGeneratePhraseQueries to true
> or replace dashes with white-spaces at client side.
>
> Ahmet
>
>
>
> On Wednesday, September 23, 2015 10:00 PM, Andreas Hubold 
>  wrote:
> Hi,
>
> we're using Solr 4.10.4 and the dismax query parser to search across
> multiple fields. One of the fields is configured with a
> StandardTokenizer (type "text_general"). I set mm=100% to only get hits
> that match all terms.
>
> This does not seem to work for queries that are split into multiple
> tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav",
> "001") returns documents that only have "cc" in it. I need a result with
> documents that contains all tokens - as returned by the /select handler.
>
> Is there a way to force AND semantics for such dismax queries? I also
> tried to set q.op=AND but it did not help.
>
> The query is parsed as:
>
> (+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) |
> productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav
> 001")~0.1))/no_coord
>
> Thanks in advance!
>
> Regards,
> Andreas
>


Re: Taking Solr to production with docker

2015-09-24 Thread Epo Jemba
Ugo

Don't get me wrong I know Solr is already scaling by itself ,
But in some cases, Solr in order to be fully usable has to be
 integrated/extended with a bunch of other apps : Your own,
load-balancers, frontends , etc
In order ALL of those work together the right way, you come up
with a higher solution, that will organize stuff altogether,
 in my case kubernetes  (it could have been others rancher.io, tectonic,
panama, etc )




2015-09-24 10:37 GMT+02:00 Ugo Matrangolo :

> Hi,
>
> still don't get it :)
>
> With Solr 5 it auto-installs itself as a supervised service and works
> really nice in an AWS CloudFormation template.
>
> Best
> Ugo
>
>
> On Wed, Sep 23, 2015 at 10:01 PM, Joe Lawson <
> jlaw...@opensourceconnections.com> wrote:
>
> > we get to run commands like, docker run solr and have solr working!
> >
> > containers make new application deployments a breeze.
> >
> > On Wed, Sep 23, 2015 at 4:35 PM, Ugo Matrangolo <
> ugo.matrang...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > just curious: what you get by running Solr into a Docker container ?
> > >
> > > Best
> > > Ugo
> > >
> > > On Wed, Sep 23, 2015 at 5:39 PM, Vincenzo D'Amore 
> > > wrote:
> > >
> > > > Hi Doug,
> > > >
> > > > I have ported solrcloud to docker too, I hope you can found something
> > > > interesting here:
> > > >
> > > > https://github.com/freedev/solrcloud-zookeeper-docker
> > > >
> > > > This project runs an zookeeper ensemble and a sorlcloud cluster
> within
> > > many
> > > > containers.
> > > >
> > > > Now, in my spare time, I'm trying to port this project to
> kubernetes, I
> > > > would like to split all these containers to many nodes.
> > > >
> > > > Cheers,
> > > > Vincenzo
> > > >
> > > >
> > > > On Wed, Sep 23, 2015 at 6:15 PM, Christopher Bradford <
> > > > cbradf...@opensourceconnections.com> wrote:
> > > >
> > > > > Hi Doug,
> > > > >
> > > > > The Dockerfiles we use have been pushed up to a GitHub repo
> > > > > https://github.com/o19s/solr-docker. I'm happy to answer any
> > questions
> > > > > about them.
> > > > >
> > > > > ~Chris
> > > > >
> > > > > On Wed, Sep 23, 2015 at 8:47 AM Doug Turnbull <
> > > > > dturnb...@opensourceconnections.com> wrote:
> > > > >
> > > > > > Our test Solr and Elasticsearch instances for Quepid(
> > > http://quepid.com
> > > > )
> > > > > are
> > > > > > now hosted on docker (specifically kubernetes)
> > > > > >
> > > > > > It's worked pretty well. I'd suggest if you're curious to speak
> to
> > my
> > > > > > devops focussed colleague Chris Bradford that has a great deal of
> > > > > > experience here. I haven't encountered any issues that would lead
> > me
> > > to
> > > > > > describe it as "not ready for production"
> > > > > >
> > > > > > Doug
> > > > > >
> > > > > >
> > > > > > On Wednesday, September 23, 2015, Upayavira 
> > wrote:
> > > > > >
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Sep 23, 2015, at 02:00 PM,
> > aurelien.mazo...@francelabs.com
> > > > > >> wrote:
> > > > > >> > Hi Solr community,
> > > > > >> >
> > > > > >> > I can find many blog posts on how to deploy Solr with docker
> > but I
> > > > am
> > > > > >> > wondering if Solr/Docker is really ready for production.
> > > > > >> > Has anybody ever ran Solr in production with Docker?
> > > > > >>
> > > > > >> Hi Aurelien,
> > > > > >>
> > > > > >> I'm wondering if there's anything specific that is needed to run
> > > Solr
> > > > > >> inside Docker? Is there something you have in mind?
> > > > > >>
> > > > > >> Upayavira
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource
> > > > Connections
> > > > > > , LLC | 240.476.9983
> > > > > > Author: Relevant Search 
> > > > > > This e-mail and all contents, including attachments, is
> considered
> > to
> > > > be
> > > > > > Company Confidential unless explicitly stated otherwise,
> regardless
> > > > > > of whether attachments are marked as such.
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Vincenzo D'Amore
> > > > email: v.dam...@gmail.com
> > > > skype: free.dev
> > > > mobile: +39 349 8513251
> > > >
> > >
> >
>


Re: Taking Solr to production with docker

2015-09-24 Thread Martijn Koster

> On 23 Sep 2015, at 15:13, Upayavira  wrote:
> 
> I'm wondering if there's anything specific that is needed to run Solr
> inside Docker? Is there something you have in mind?

There isn't really. See https://hub.docker.com/r/makuk66/docker-solr/ 
 for an example.

I'm working on turning that into an official Docker Hub image 
(https://github.com/docker-solr/docker-solr 
), so that you will be able to just 
"docker run solr" and have Solr up and running. That is currently under review 
by Docker Inc on https://github.com/docker-library/official-images/pull/107 
. If you add a +1 
there it might speed the process up.

-- Martijn



Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread Yonik Seeley
On Mon, Sep 21, 2015 at 8:09 AM, Uwe Reh  wrote:
> our bibliographic index (~20M entries) runs fine with Solr 4.10.3
> With Solr 5.3 faceted searching is constantly incredibly slow (~ 20 seconds)
[...]
>
> The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
> 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
> cumulative_hitratio of 1.


Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
removed as part of LUCENE-5666, causing these performance regressions.

This code had been evolved over years to be very fast for specific use
cases.  No one facet algorithm is going to be optimal for everyone, so
it's important we have multiple.  But use of the UnInvertedField was
removed without any notification or discussion whatsoever (and
obviously no benchmarking), and was only discovered later by Solr devs
in SOLR-7190 that it was essentially dead code.


When I brought back my "JSON Facet API" work to Solr (which was based
on 4.10.x) it came with a heavily modified version of UnInvertedField
that is available via the JSON Facet API.  It might currently work
better for your usecase.

On your normal (non-docValues) index, you can try something like the
following to see what the performance would be:

$ curl http://yxz/solr/hebis/query -d 'q=darwin&
json.facet={
  authors : { type:terms, field:author_facet, limit:30 },
  material_access : { type:terms, field:material_access, limit:30 },
  material_brief : { type:terms, field:material_brief, limit:30 },
  rvk : { type:terms, field:rvk_facet, limit:30 },
  lang : { type:terms, field:language, limit:30 },
  dept : { type:terms, field:department_3, limit:30 }
}'

There were other changes in LUCENE-5666 that will probably slow down
faceting on the single valued fields as well (so this may still be a
little slower than 4.10.x), but hopefully it would be more
competitive.

-Yonik


Re: Is docValues required in Solr 5.x for distributed result grouping?

2015-09-24 Thread Tomoko Uchida
Hi,

> Of course the Doc Values data structure can be used by anything who wants
> to retrieve the column base view of documents per field, but is anywhere
> documented all the ways it's used in Solr ?

According to the Solr guide about docvalues, it is used in faceting,
sorting, and grouping.
https://cwiki.apache.org/confluence/display/solr/DocValues

> In Lucene 4.0, a new approach was introduced. DocValue fields are now
column-oriented fields with a  document-to-value mapping built at index
time. This approach promises to relieve some of the memory requirements of
the fieldCache and make lookups for faceting, sorting, and grouping much
faster.

I'm not sure about other docvalues uses not mentioned here... I'd also like
to know if there are.

Thanks,
Tomoko

2015-09-24 17:26 GMT+09:00 Alessandro Benedetti 
:

> I didn't know DocValues to be used apart from sorting and faceting (fc
> algorithm) on fields.
> Of course the Doc Values data structure can be used by anything who wants
> to retrieve the column base view of documents per field, but is anywhere
> documented all the ways it's used in Solr ?
>
> Cheers
>
> 2015-09-24 2:33 GMT+01:00 Shawn Heisey :
>
> > On 9/18/2015 3:27 PM, Shawn Heisey wrote:
> > > A query that works fine in Solr 4.9.1 doesn't work in 5.2.1 with the
> > > same schema.  The field that I am grouping on does not have docValues.
> > > I get this exception:
> > >
> > > java.lang.IllegalStateException: unexpected docvalues type SORTED_SET
> > > for field 'ip' (expected=SORTED). Use UninvertingReader or index with
> > > docvalues.
> >
> > No response in five days ... I guess I'll go ahead and raise that issue.
> >
> > Thanks,
> > Shawn
> >
> >
> >
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Solr Cloud: Indexing in a Map reduce Job with Kerberos

2015-09-24 Thread Bertrand Venzal
Hi all,

As a bit of background, we're trying to run a map-reduce job on a Hadoop 
cluster (CDH version
5.4.5) which involved writing from Solr during both the Map phase. To accomplish
this, we are using the Solrj library with version 4.10.3-cdh5.4.5. In the 
driver class which launch the MR Job, we have managed to get solrj to 
authenticate correctly with Kerberos,
creating a jaas file which points to a previously created keytab file. However, 
the distributed map-reduce job will not have the jaas file or keytab file on 
the worker nodes it will be distributed
to. We were wondering whether it is possible to set up Kerberos with Solrj in 
the driver step
of a map-reduce job, so that the individual mappers and reducers remain 
authenticated during
the running process?

Thanks and kind regards,Bertrand


Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread Alessandro Benedetti
Yonik, I am really excited about the Json faceting module.
I find it really interesting.
Is there any pros/cons in using them, or it's definitely the "approach of
the future" ?
I saw your benchmarks and seems impressive.

I have not read all the topic in details, just briefly, but is Json
faceting using different faceting algorithms from the standard ones ? (
Enum and fc)
I can not find the algorithm parameter to be passed in the Json facets.
Are they using a complete different approach ?
Is the algorithm used expressed anywhere ?
This could give very good insights on when to use them.

Cheers

2015-09-24 14:58 GMT+01:00 Yonik Seeley :

> On Mon, Sep 21, 2015 at 8:09 AM, Uwe Reh 
> wrote:
> > our bibliographic index (~20M entries) runs fine with Solr 4.10.3
> > With Solr 5.3 faceted searching is constantly incredibly slow (~ 20
> seconds)
> [...]
> >
> > The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
> > 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
> > cumulative_hitratio of 1.
>
>
> Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
> removed as part of LUCENE-5666, causing these performance regressions.
>
> This code had been evolved over years to be very fast for specific use
> cases.  No one facet algorithm is going to be optimal for everyone, so
> it's important we have multiple.  But use of the UnInvertedField was
> removed without any notification or discussion whatsoever (and
> obviously no benchmarking), and was only discovered later by Solr devs
> in SOLR-7190 that it was essentially dead code.
>
>
> When I brought back my "JSON Facet API" work to Solr (which was based
> on 4.10.x) it came with a heavily modified version of UnInvertedField
> that is available via the JSON Facet API.  It might currently work
> better for your usecase.
>
> On your normal (non-docValues) index, you can try something like the
> following to see what the performance would be:
>
> $ curl http://yxz/solr/hebis/query -d 'q=darwin&
> json.facet={
>   authors : { type:terms, field:author_facet, limit:30 },
>   material_access : { type:terms, field:material_access, limit:30 },
>   material_brief : { type:terms, field:material_brief, limit:30 },
>   rvk : { type:terms, field:rvk_facet, limit:30 },
>   lang : { type:terms, field:language, limit:30 },
>   dept : { type:terms, field:department_3, limit:30 }
> }'
>
> There were other changes in LUCENE-5666 that will probably slow down
> faceting on the single valued fields as well (so this may still be a
> little slower than 4.10.x), but hopefully it would be more
> competitive.
>
> -Yonik
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: query parsing

2015-09-24 Thread Erick Erickson
Geraint:

Good Catch! I totally missed that. So all of our focus on schema.xml has
been... totally irrelevant. Now that you pointed that out, there's also the
addition: add-unknown-fields-to-the-schema, which indicates you started
this up in "schemaless" mode.

In short, solr is trying to guess what your field types should be and
guessing wrong (again and again and again). This is the classic weakness of
schemaless. It's great for indexing stuff fast, but if it guesses wrong
you're stuck.


So to the original problem: I'd start over and either
1> use the regular setup, not schemaless
or
2> use the _managed_ schema API to explicitly add fields and fieldTypes to
the managed schema

Best,
Erick

On Thu, Sep 24, 2015 at 2:02 AM, Duck Geraint (ext) GBJH <
geraint.d...@syngenta.com> wrote:

> Okay, so maybe I'm missing something here (I'm still relatively new to
> Solr myself), but am I right in thinking the following is still in your
> solrconfig.xml file:
>
>   
> true
> managed-schema
>   
>
> If so, wouldn't using a managed schema make several of your field
> definitions inside the schema.xml file semi-redundant?
>
> Regards,
> Geraint
>
>
> Geraint Duck
> Data Scientist
> Toxicology and Health Sciences
> Syngenta UK
> Email: geraint.d...@syngenta.com
>
>
> -Original Message-
> From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com]
> Sent: 24 September 2015 09:23
> To: solr-user@lucene.apache.org
> Subject: Re: query parsing
>
> I would focus on this :
>
> "
>
> > 5> now kick off the DIH job and look again.
> >
> Now it shows a histogram, but most of the "terms" are long -- the full
> texts of (the table.column) eventlogtext.logtext, including the whitespace
> (with %0A used for newline characters)...  So, it appears it is not being
> tokenized properly, correct?"
> Can you open from your Solr ui , the schema xml and show us the snippets
> for that field that seems to not tokenise ?
> Can you show us ( even a screenshot is fine) the schema browser page
> related ?
> Could be a problem of encoding ?
> Following Erick details about the analysis, what are your results ?
>
> Cheers
>
> 2015-09-24 8:04 GMT+01:00 Upayavira :
>
> > typically, the index dir is inside the data dir. Delete the index dir
> > and you should be good. If there is a tlog next to it, you might want
> > to delete that also.
> >
> > If you dont have a data dir, i wonder whether you set the data dir
> > when creating your core or collection. Typically the instance dir and
> > data dir aren't needed.
> >
> > Upayavira
> >
> > On Wed, Sep 23, 2015, at 10:46 PM, Erick Erickson wrote:
> > > OK, this is bizarre. You'd have had to set up SolrCloud by
> > > specifying the -zkRun command when you start Solr or the -zkHost;
> > > highly unlikely. On the admin page there would be a "cloud" link on
> > > the left side, I really doubt one's there.
> > >
> > > You should have a data directory, it should be the parent of the
> > > index and tlog directories. As of sanity check try looking at the
> > > analysis page.
> > > Type
> > > a bunch of words in the left hand side indexing box and uncheck the
> > > verbose box. As you can tell I'm grasping at straws. I'm still
> > > puzzled why you don't have a "data" directory here, but that
> > > shouldn't really matter. How did you create this index? I don't mean
> > > data import handler more how did you create the core that you're
> > > indexing to?
> > >
> > > Best,
> > > Erick
> > >
> > > On Wed, Sep 23, 2015 at 10:16 AM, Mark Fenbers
> > > 
> > > wrote:
> > >
> > > > On 9/23/2015 12:30 PM, Erick Erickson wrote:
> > > >
> > > >> Then my next guess is you're not pointing at the index you think
> > > >> you
> > are
> > > >> when you 'rm -rf data'
> > > >>
> > > >> Just ignore the Elall field for now I should think, although get
> > > >> rid
> > of it
> > > >> if you don't think you need it.
> > > >>
> > > >> DIH should be irrelevant here.
> > > >>
> > > >> So let's back up.
> > > >> 1> go ahead and "rm -fr data" (with Solr stopped).
> > > >>
> > > > I have no "data" dir.  Did you mean "index" dir?  I removed 3
> > > > index directories (2 for spelling):
> > > > cd /localapps/dev/eventLog; rm -rfv index solr/spFile solr/spIndex
> > > >
> > > >> 2> start Solr
> > > >> 3> do NOT re-index.
> > > >> 4> look at your index via the schema-browser. Of course there
> > > >> 4> should
> > be
> > > >> nothing there!
> > > >>
> > > > Correct!  It said "there is no term info :("
> > > >
> > > >> 5> now kick off the DIH job and look again.
> > > >>
> > > > Now it shows a histogram, but most of the "terms" are long -- the
> > > > full texts of (the table.column) eventlogtext.logtext, including
> > > > the
> > whitespace
> > > > (with %0A used for newline characters)...  So, it appears it is
> > > > not
> > being
> > > > tokenized properly, correct?
> > > >
> > > >> Your logtext field should have only single tokens. The fact that
> > > >> you
> > have
> > > >> 

Re: Cloud Deployment Strategy... In the Cloud

2015-09-24 Thread Dan Davis
ant is very good at this sort of thing, and easier for Java devs to learn
than Make.  Python has a module called fabric that is also very fine, but
for my dev. ops. it is another thing to learn.
I tend to divide things into three categories:

 - Things that have to do with system setup, and need to be run as root.
For this I write a bash script (I should learn puppet, but...)
 - Things that have to do with one time installation as a solr admin user
with /bin/bash, including upconfig.   For this I use an ant build.
 - Normal operational procedures.   For this, I typically use Solr admin or
scripts, but I wish I had time to create a good webapp (or money to
purchase Fusion).


On Thu, Sep 24, 2015 at 12:39 AM, Erick Erickson 
wrote:

> bq: What tools do you use for the "auto setup"? How do you get your config
> automatically uploaded to zk?
>
> Both uploading the config to ZK and creating collections are one-time
> operations, usually done manually. Currently uploading the config set is
> accomplished with zkCli (yes, it's a little clumsy). There's a JIRA to put
> this into solr/bin as a command though. They'd be easy enough to script in
> any given situation though with a shell script or wizard
>
> Best,
> Erick
>
> On Wed, Sep 23, 2015 at 7:33 PM, Steve Davids  wrote:
>
> > What tools do you use for the "auto setup"? How do you get your config
> > automatically uploaded to zk?
> >
> > On Tue, Sep 22, 2015 at 2:35 PM, Gili Nachum 
> wrote:
> >
> > > Our auto setup sequence is:
> > > 1.deploy 3 zk nodes
> > > 2. Deploy solr nodes and start them connecting to zk.
> > > 3. Upload collection config to zk.
> > > 4. Call create collection rest api.
> > > 5. Done. SolrCloud ready to work.
> > >
> > > Don't yet have automation for replacing or adding a node.
> > > On Sep 22, 2015 18:27, "Steve Davids"  wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to come up with a repeatable process for deploying a Solr
> > > Cloud
> > > > cluster from scratch along with the appropriate security groups, auto
> > > > scaling groups, and custom Solr plugin code. I saw that LucidWorks
> > > created
> > > > a Solr Scale Toolkit but that seems to be more of a one-shot deal
> than
> > > > really setting up your environment for the long-haul. Here is were we
> > are
> > > > at right now:
> > > >
> > > >1. ZooKeeper ensemble is easily brought up via a Cloud Formation
> > > Script
> > > >2. We have an RPM built to lay down the Solr distribution + Custom
> > > >plugins + Configuration
> > > >3. Solr machines come up and connect to ZK
> > > >
> > > > Now, we are using Puppet which could easily create the
> core.properties
> > > file
> > > > for the corresponding core and have ZK get bootstrapped but that
> seems
> > to
> > > > be a no-no these days... So, can anyone think of a way to get ZK
> > > > bootstrapped automatically with pre-configured Collection
> > configurations?
> > > > Also, is there a recommendation on how to deal with machines that are
> > > > coming/going? As I see it machines will be getting spun up and
> > terminated
> > > > from time to time and we need to have a process of dealing with that,
> > the
> > > > first idea was to just use a common node name so if a machine was
> > > > terminated a new one can come up and replace that particular node but
> > on
> > > > second thought it would seem to require an auto scaling group *per*
> > node
> > > > (so it knows what node name it is). For a large cluster this seems
> > crazy
> > > > from a maintenance perspective, especially if you want to be elastic
> > with
> > > > regard to the number of live replicas for peak times. So, then the
> next
> > > > idea was to have some outside observer listen to when new ec2
> instances
> > > are
> > > > created or terminated (via CloudWatch SQS) and make the appropriate
> API
> > > > calls to either add the replica or delete it, this seems doable but
> > > perhaps
> > > > not the simplest solution that could work.
> > > >
> > > > I was hoping others have already gone through this and have valuable
> > > advice
> > > > to give, we are trying to setup Solr Cloud the "right way" so we
> don't
> > > get
> > > > nickel-and-dimed to death from an O perspective.
> > > >
> > > > Thanks,
> > > >
> > > > -Steve
> > > >
> > >
> >
>


Re: Is docValues required in Solr 5.x for distributed result grouping?

2015-09-24 Thread Oliver Schrenk
The error message looks a lot like this bug

https://issues.apache.org/jira/browse/SOLR-7495

group.faceting is broken for numeric values. Does it mean that I have to enable 
docvalues for every field that I want to facet on?



On 24 Sep 2015, at 17:02, Tomoko Uchida 
> wrote:

Hi,

Of course the Doc Values data structure can be used by anything who wants
to retrieve the column base view of documents per field, but is anywhere
documented all the ways it's used in Solr ?

According to the Solr guide about docvalues, it is used in faceting,
sorting, and grouping.
https://cwiki.apache.org/confluence/display/solr/DocValues

In Lucene 4.0, a new approach was introduced. DocValue fields are now
column-oriented fields with a  document-to-value mapping built at index
time. This approach promises to relieve some of the memory requirements of
the fieldCache and make lookups for faceting, sorting, and grouping much
faster.

I'm not sure about other docvalues uses not mentioned here... I'd also like
to know if there are.

Thanks,
Tomoko

2015-09-24 17:26 GMT+09:00 Alessandro Benedetti 
:

I didn't know DocValues to be used apart from sorting and faceting (fc
algorithm) on fields.
Of course the Doc Values data structure can be used by anything who wants
to retrieve the column base view of documents per field, but is anywhere
documented all the ways it's used in Solr ?

Cheers

2015-09-24 2:33 GMT+01:00 Shawn Heisey :

On 9/18/2015 3:27 PM, Shawn Heisey wrote:
A query that works fine in Solr 4.9.1 doesn't work in 5.2.1 with the
same schema.  The field that I am grouping on does not have docValues.
I get this exception:

java.lang.IllegalStateException: unexpected docvalues type SORTED_SET
for field 'ip' (expected=SORTED). Use UninvertingReader or index with
docvalues.

No response in five days ... I guess I'll go ahead and raise that issue.

Thanks,
Shawn





--
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England




Re: Solr Log Analysis

2015-09-24 Thread Will Hayes
Hi - If you use Logstash for the log ingestion the config below will parse
what you need for search analytics including: terms, 0 results, response
times and more. Happy to assist off-list if you have any questions

https://github.com/LucidWorks/silkusecases/blob/master/searchanalytics/silk_solrlogs.conf

If you want an open source Solr centric solution pre-packaged for the
ingestion and visualization of those logs you can download Silk here:

http://lucidworks.com/fusion/silk/

Best Regards,
-wh

--
Will Hayes | CEO
email. w...@lucidworks.com | direct. +1.415.997.9455
Lucidworks | 340 Brannan St. Suite 400 | San Francisco, CA
www.lucidworks.com | www.linkedin.com/in/willhayes | @iamwillhayes


On Thu, Sep 24, 2015 at 3:23 AM, Ahmet Arslan 
wrote:

> Hi Tarala,
>
> Never used my self but please see:
>
> https://soleami.com/blog/soleami-start_en.html
>
> Ahmet
>
>
>
> On Thursday, September 24, 2015 3:16 AM, "Tarala, Magesh" 
> wrote:
> I'm using Solr 4.10.4 in a 3 node cloud setup. I have 3 shards and 3
> replicas for the collection.
>
> I want to analyze the logs to extract the queries and query times. Is
> there a tool or script someone has created already for this?
>
> Thanks,
> Magesh
>


[ANNOUNCE] Apache Lucene 5.3.1 released

2015-09-24 Thread Noble Paul
24 September 2015, Apache Solr™ 5.3.1 available


The Lucene PMC is pleased to announce the release of Apache Solr 5.3.1


Solr is the popular, blazing fast, open source NoSQL search platform

from the Apache Lucene project. Its major features include powerful

full-text search, hit highlighting, faceted search, dynamic

clustering, database integration, rich document (e.g., Word, PDF)

handling, and geospatial search. Solr is highly scalable, providing

fault tolerant distributed search and indexing, and powers the search

and navigation features of many of the world's largest internet sites.


This release contains various bug fixes and optimizations since the
5.3.0 release. The release is available for immediate download at:


  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html


Please read CHANGES.txt for a full list of new features and changes:


  https://lucene.apache.org/solr/5_3_1/changes/Changes.html


Solr 5.3.1 includes these bug fixes.


 * security.json is not loaded on server start

 * RuleBasedAuthorization plugin does not respect the
collection-admin-edit permission

 * Fix VelocityResponseWriter template encoding issue. Templates must
be UTF-8 encoded

 * SimplePostTool (also bin/post) -filetypes "*" now works properly in
'web' mode

 * example/files update-script.js to be Java 7 and 8 compatible.

 * SolrJ could not make requests to handlers with '/admin/' prefix

 * Use of timeAllowed can cause incomplete filters to be cached and
incorrect results to be returned on subsequent requests

 * VelocityResponseWriter's $resource.get(key,baseName,locale) to use
specified locale.

 * Fix the exclusion filter so that collections that start with js,
css, img, tpl can be accessed.

 * Resolve XSS issue in Admin UI stats page


Known issues:

 * On Windows, bin/solr.cmd script fails to start correctly when using
relative path with -s parameter. Use absolute path as a workaround .
https://issues.apache.org/jira/browse/SOLR-8073


See the CHANGES.txt file included with the release for a full list of
changes and further details.


Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)


Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

Noble Paul
on behalf of Lucene PMC


[ANNOUNCE] Apache Solr 5.3.1 released

2015-09-24 Thread Noble Paul
24 September 2015, Apache Solr™ 5.3.1 available


The Lucene PMC is pleased to announce the release of Apache Solr 5.3.1


Solr is the popular, blazing fast, open source NoSQL search platform

from the Apache Lucene project. Its major features include powerful

full-text search, hit highlighting, faceted search, dynamic

clustering, database integration, rich document (e.g., Word, PDF)

handling, and geospatial search. Solr is highly scalable, providing

fault tolerant distributed search and indexing, and powers the search

and navigation features of many of the world's largest internet sites.


This release contains various bug fixes and optimizations since the
5.3.0 release. The release is available for immediate download at:


  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html


Please read CHANGES.txt for a full list of new features and changes:


  https://lucene.apache.org/solr/5_3_1/changes/Changes.html


Solr 5.3.1 includes these bug fixes.


 * security.json is not loaded on server start

 * RuleBasedAuthorization plugin does not respect the
collection-admin-edit permission

 * Fix VelocityResponseWriter template encoding issue. Templates must
be UTF-8 encoded

 * SimplePostTool (also bin/post) -filetypes "*" now works properly in
'web' mode

 * example/files update-script.js to be Java 7 and 8 compatible.

 * SolrJ could not make requests to handlers with '/admin/' prefix

 * Use of timeAllowed can cause incomplete filters to be cached and
incorrect results to be returned on subsequent requests

 * VelocityResponseWriter's $resource.get(key,baseName,locale) to use
specified locale.

 * Fix the exclusion filter so that collections that start with js,
css, img, tpl can be accessed.

 * Resolve XSS issue in Admin UI stats page


Known issues:

 * On Windows, bin/solr.cmd script fails to start correctly when using
relative path with -s parameter. Use absolute path as a workaround .
https://issues.apache.org/jira/browse/SOLR-8073


See the CHANGES.txt file included with the release for a full list of
changes and further details.

Noble Paul
on behalf of Lucene PMC


Re: [ANNOUNCE] Apache Lucene 5.3.1 released

2015-09-24 Thread Noble Paul
Wrong title

On Thu, Sep 24, 2015 at 10:55 PM, Noble Paul  wrote:
> 24 September 2015, Apache Solr™ 5.3.1 available
>
>
> The Lucene PMC is pleased to announce the release of Apache Solr 5.3.1
>
>
> Solr is the popular, blazing fast, open source NoSQL search platform
>
> from the Apache Lucene project. Its major features include powerful
>
> full-text search, hit highlighting, faceted search, dynamic
>
> clustering, database integration, rich document (e.g., Word, PDF)
>
> handling, and geospatial search. Solr is highly scalable, providing
>
> fault tolerant distributed search and indexing, and powers the search
>
> and navigation features of many of the world's largest internet sites.
>
>
> This release contains various bug fixes and optimizations since the
> 5.3.0 release. The release is available for immediate download at:
>
>
>   http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
>
>
> Please read CHANGES.txt for a full list of new features and changes:
>
>
>   https://lucene.apache.org/solr/5_3_1/changes/Changes.html
>
>
> Solr 5.3.1 includes these bug fixes.
>
>
>  * security.json is not loaded on server start
>
>  * RuleBasedAuthorization plugin does not respect the
> collection-admin-edit permission
>
>  * Fix VelocityResponseWriter template encoding issue. Templates must
> be UTF-8 encoded
>
>  * SimplePostTool (also bin/post) -filetypes "*" now works properly in
> 'web' mode
>
>  * example/files update-script.js to be Java 7 and 8 compatible.
>
>  * SolrJ could not make requests to handlers with '/admin/' prefix
>
>  * Use of timeAllowed can cause incomplete filters to be cached and
> incorrect results to be returned on subsequent requests
>
>  * VelocityResponseWriter's $resource.get(key,baseName,locale) to use
> specified locale.
>
>  * Fix the exclusion filter so that collections that start with js,
> css, img, tpl can be accessed.
>
>  * Resolve XSS issue in Admin UI stats page
>
>
> Known issues:
>
>  * On Windows, bin/solr.cmd script fails to start correctly when using
> relative path with -s parameter. Use absolute path as a workaround .
> https://issues.apache.org/jira/browse/SOLR-8073
>
>
> See the CHANGES.txt file included with the release for a full list of
> changes and further details.
>
>
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/solr/discussion.html)
>
>
> Note: The Apache Software Foundation uses an extensive mirroring
> network for distributing releases. It is possible that the mirror you
> are using may not have replicated the release yet. If that is the
> case, please try another mirror. This also goes for Maven access.
>
> Noble Paul
> on behalf of Lucene PMC



-- 
-
Noble Paul


Different ports for search and upload request

2015-09-24 Thread Siddhartha Singh Sandhu
Hi,

I wanted to know if we can configure different ports as end points for
uploading and searching API. Also, if someone could point me in the right
direction.

Regards,

Sid.


Re: Solr Log Analysis

2015-09-24 Thread Otis Gospodnetić
Hi Magesh,

Here are 2 more solutions you could use:

1) Site Search Analytics  -- this
basically integrates into your search results via JS like Google Analytics
and automatically captures a bunch of search and click data and gives you a
number of search-focused reports/charts (at aggregate level) out of the box.
2) Logsene  - this will take any events and
let you search then, filter them, graph them, aggregate on them, alert on
them, put them on custom dashboards, correlate with app/server logs, with
Solr metrics (via SPM for Solr
), etc. etc.  If
you happen to like/know Kibana or Banana or Spunk this will look very
familiar.  As a matter of fact, Kibana 4 is integrated into Logsene.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Sep 23, 2015 at 8:16 PM, Tarala, Magesh  wrote:

> I'm using Solr 4.10.4 in a 3 node cloud setup. I have 3 shards and 3
> replicas for the collection.
>
> I want to analyze the logs to extract the queries and query times. Is
> there a tool or script someone has created already for this?
>
> Thanks,
> Magesh
>


Autowarm and filtercache invalidation

2015-09-24 Thread Jeff Wartes

If I configure my filterCache like this:


and I have <= 10 distinct filter queries I ever use, does that mean I’ve
effectively disabled cache invalidation? So my cached filter query results
will never change? (short of JVM restart)

I’m unclear on whether autowarm simply copies the value into the new
searcher’s cache or whether it tries to rebuild the results of the cached
filter query based on the new searcher’s view of the data.



Re: Different ports for search and upload request

2015-09-24 Thread Siddhartha Singh Sandhu
Thank you so much.

Safe to ignore the following(not a query):-

*Never did this. *But how about this crazy idea:

Take an Amazon EFS and share it between 2 EC2. Use one EC2 endpt to update
the index on EFS while the other reads from it. This way each EC2 can use
its own compute and not share its resources among-st solr threads.

Regards,
Sid.

On Thu, Sep 24, 2015 at 5:17 PM, Shawn Heisey  wrote:

> On 9/24/2015 2:01 PM, Siddhartha Singh Sandhu wrote:
> > I wanted to know if we can configure different ports as end points for
> > uploading and searching API. Also, if someone could point me in the right
> > direction.
>
> From our perspective, no.
>
> I have no idea whether it is possible at all ... it might be something
> that a servlet container expert could figure out, or it might require
> code changes to Solr itself.
>
> You probably need another mailing list specifically for the container.
> For virtually all 5.x installs, the container is Jetty.  In earlier
> versions, it could be any container.
>
> Another possibility would be putting an intelligent proxy in front of
> Solr and having it only accept certain handler paths on certain ports,
> then forward them to the common port on the Solr server.
>
> If you did manage to do this, it would require custom client code.  None
> of the Solr clients for programming languages have a facility for
> separate ports.
>
> Thanks,
> Shawn
>
>


Re: Different ports for search and upload request

2015-09-24 Thread Siddhartha Singh Sandhu
Hey,

Thank you for your reply.

The use case would be that I can concurrently load data into my index via
one port and then make that(*data) available(NRT search) to user through
another high availability search endpoint without the fear of my requests
clogging one port.

Regards,

Sid.

On Thu, Sep 24, 2015 at 4:45 PM, Susheel Kumar 
wrote:

> I am not aware of such a feature in Solr but do want to know your use case
> / logic behind coming up with different ports.  If it is for security /
> exposing to user, usually Solr shouldn't be exposed to user directly but
> via application / service / api.
>
> Thanks,
> Susheel
>
> On Thu, Sep 24, 2015 at 4:01 PM, Siddhartha Singh Sandhu <
> sandhus...@gmail.com> wrote:
>
> > Hi,
> >
> > I wanted to know if we can configure different ports as end points for
> > uploading and searching API. Also, if someone could point me in the right
> > direction.
> >
> > Regards,
> >
> > Sid.
> >
>


Re: Different ports for search and upload request

2015-09-24 Thread Yonik Seeley
On Thu, Sep 24, 2015 at 5:00 PM, Siddhartha Singh Sandhu
 wrote:
> Hey,
>
> Thank you for your reply.
>
> The use case would be that I can concurrently load data into my index via
> one port and then make that(*data) available(NRT search) to user through
> another high availability search endpoint without the fear of my requests
> clogging one port.

Not yet, but it's in development.
It won't require a different port either... different endpoints will
be able to have different request queues and thread pools.

-Yonik


Re: Different ports for search and upload request

2015-09-24 Thread Alexandre Rafalovitch
But they would still compete for the servlet engine's threads. Putting
them on different ports will not change anything. Now, if you wanted
to put them on different network interfaces, that could be something.
But I do not think it is possible, as the select and update are both
just configuration defined end-points (in solrconfig.xml).

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 24 September 2015 at 17:00, Siddhartha Singh Sandhu
 wrote:
> Hey,
>
> Thank you for your reply.
>
> The use case would be that I can concurrently load data into my index via
> one port and then make that(*data) available(NRT search) to user through
> another high availability search endpoint without the fear of my requests
> clogging one port.
>
> Regards,
>
> Sid.
>
> On Thu, Sep 24, 2015 at 4:45 PM, Susheel Kumar 
> wrote:
>
>> I am not aware of such a feature in Solr but do want to know your use case
>> / logic behind coming up with different ports.  If it is for security /
>> exposing to user, usually Solr shouldn't be exposed to user directly but
>> via application / service / api.
>>
>> Thanks,
>> Susheel
>>
>> On Thu, Sep 24, 2015 at 4:01 PM, Siddhartha Singh Sandhu <
>> sandhus...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I wanted to know if we can configure different ports as end points for
>> > uploading and searching API. Also, if someone could point me in the right
>> > direction.
>> >
>> > Regards,
>> >
>> > Sid.
>> >
>>


Re: Different ports for search and upload request

2015-09-24 Thread Shawn Heisey
On 9/24/2015 2:01 PM, Siddhartha Singh Sandhu wrote:
> I wanted to know if we can configure different ports as end points for
> uploading and searching API. Also, if someone could point me in the right
> direction.

>From our perspective, no.

I have no idea whether it is possible at all ... it might be something
that a servlet container expert could figure out, or it might require
code changes to Solr itself.

You probably need another mailing list specifically for the container.
For virtually all 5.x installs, the container is Jetty.  In earlier
versions, it could be any container.

Another possibility would be putting an intelligent proxy in front of
Solr and having it only accept certain handler paths on certain ports,
then forward them to the common port on the Solr server.

If you did manage to do this, it would require custom client code.  None
of the Solr clients for programming languages have a facility for
separate ports.

Thanks,
Shawn



Re: Autowarm and filtercache invalidation

2015-09-24 Thread Jeff Wartes
Answering my own question: Looks like the default filterCache regenerator
uses the old cache to re-executes queries in the context of the new
searcher and does nothing with the old cache value.

So, the new searcher’s cache contents will be consistent with that
searcher’s view, regardless of whether it was populated via autowarm.


On 9/24/15, 11:28 AM, "Jeff Wartes"  wrote:

>
>If I configure my filterCache like this:
>autowarmCount="10"/>
>
>and I have <= 10 distinct filter queries I ever use, does that mean I’ve
>effectively disabled cache invalidation? So my cached filter query results
>will never change? (short of JVM restart)
>
>I’m unclear on whether autowarm simply copies the value into the new
>searcher’s cache or whether it tries to rebuild the results of the cached
>filter query based on the new searcher’s view of the data.
>



Re: Different ports for search and upload request

2015-09-24 Thread billnbell
Scary stuff 

If you did that you better reload the core 

Bill Bell
Sent from mobile


> On Sep 24, 2015, at 5:05 PM, Siddhartha Singh Sandhu  
> wrote:
> 
> Thank you so much.
> 
> Safe to ignore the following(not a query):-
> 
> *Never did this. *But how about this crazy idea:
> 
> Take an Amazon EFS and share it between 2 EC2. Use one EC2 endpt to update
> the index on EFS while the other reads from it. This way each EC2 can use
> its own compute and not share its resources among-st solr threads.
> 
> Regards,
> Sid.
> 
>> On Thu, Sep 24, 2015 at 5:17 PM, Shawn Heisey  wrote:
>> 
>>> On 9/24/2015 2:01 PM, Siddhartha Singh Sandhu wrote:
>>> I wanted to know if we can configure different ports as end points for
>>> uploading and searching API. Also, if someone could point me in the right
>>> direction.
>> 
>> From our perspective, no.
>> 
>> I have no idea whether it is possible at all ... it might be something
>> that a servlet container expert could figure out, or it might require
>> code changes to Solr itself.
>> 
>> You probably need another mailing list specifically for the container.
>> For virtually all 5.x installs, the container is Jetty.  In earlier
>> versions, it could be any container.
>> 
>> Another possibility would be putting an intelligent proxy in front of
>> Solr and having it only accept certain handler paths on certain ports,
>> then forward them to the common port on the Solr server.
>> 
>> If you did manage to do this, it would require custom client code.  None
>> of the Solr clients for programming languages have a facility for
>> separate ports.
>> 
>> Thanks,
>> Shawn
>> 
>> 


Re: Autowarm and filtercache invalidation

2015-09-24 Thread Erick Erickson
Jeff:

Yes, exactly. Otherwise the autowarming would be quite useless since
what's stored in the cache is the _lucene_ doc ID (either as a bitmap
or as a list of IDs). And the lucene doc ID can change when merging,
so the old IDs are useless.

Best,
Erick

On Thu, Sep 24, 2015 at 2:11 PM, Jeff Wartes  wrote:

> Answering my own question: Looks like the default filterCache regenerator
> uses the old cache to re-executes queries in the context of the new
> searcher and does nothing with the old cache value.
>
> So, the new searcher’s cache contents will be consistent with that
> searcher’s view, regardless of whether it was populated via autowarm.
>
>
> On 9/24/15, 11:28 AM, "Jeff Wartes"  wrote:
>
> >
> >If I configure my filterCache like this:
> > >autowarmCount="10"/>
> >
> >and I have <= 10 distinct filter queries I ever use, does that mean I’ve
> >effectively disabled cache invalidation? So my cached filter query results
> >will never change? (short of JVM restart)
> >
> >I’m unclear on whether autowarm simply copies the value into the new
> >searcher’s cache or whether it tries to rebuild the results of the cached
> >filter query based on the new searcher’s view of the data.
> >
>
>


Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread billnbell
Can we add it back with a parameter at least ?

Bill Bell
Sent from mobile


> On Sep 24, 2015, at 8:58 AM, Yonik Seeley  wrote:
> 
>> On Mon, Sep 21, 2015 at 8:09 AM, Uwe Reh  wrote:
>> our bibliographic index (~20M entries) runs fine with Solr 4.10.3
>> With Solr 5.3 faceted searching is constantly incredibly slow (~ 20 seconds)
> [...]
>> 
>> The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
>> 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
>> cumulative_hitratio of 1.
> 
> 
> Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
> removed as part of LUCENE-5666, causing these performance regressions.
> 
> This code had been evolved over years to be very fast for specific use
> cases.  No one facet algorithm is going to be optimal for everyone, so
> it's important we have multiple.  But use of the UnInvertedField was
> removed without any notification or discussion whatsoever (and
> obviously no benchmarking), and was only discovered later by Solr devs
> in SOLR-7190 that it was essentially dead code.
> 
> 
> When I brought back my "JSON Facet API" work to Solr (which was based
> on 4.10.x) it came with a heavily modified version of UnInvertedField
> that is available via the JSON Facet API.  It might currently work
> better for your usecase.
> 
> On your normal (non-docValues) index, you can try something like the
> following to see what the performance would be:
> 
> $ curl http://yxz/solr/hebis/query -d 'q=darwin&
> json.facet={
>  authors : { type:terms, field:author_facet, limit:30 },
>  material_access : { type:terms, field:material_access, limit:30 },
>  material_brief : { type:terms, field:material_brief, limit:30 },
>  rvk : { type:terms, field:rvk_facet, limit:30 },
>  lang : { type:terms, field:language, limit:30 },
>  dept : { type:terms, field:department_3, limit:30 }
> }'
> 
> There were other changes in LUCENE-5666 that will probably slow down
> faceting on the single valued fields as well (so this may still be a
> little slower than 4.10.x), but hopefully it would be more
> competitive.
> 
> -Yonik


Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread Yonik Seeley
On Thu, Sep 24, 2015 at 9:58 AM, Yonik Seeley  wrote:
> Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
> removed as part of LUCENE-5666, causing these performance regressions.

I did some performance benchmarks and opened an issue.  It's bad.
https://issues.apache.org/jira/browse/SOLR-8096

-Yonik


Re: Different ports for search and upload request

2015-09-24 Thread Susheel Kumar
I am not aware of such a feature in Solr but do want to know your use case
/ logic behind coming up with different ports.  If it is for security /
exposing to user, usually Solr shouldn't be exposed to user directly but
via application / service / api.

Thanks,
Susheel

On Thu, Sep 24, 2015 at 4:01 PM, Siddhartha Singh Sandhu <
sandhus...@gmail.com> wrote:

> Hi,
>
> I wanted to know if we can configure different ports as end points for
> uploading and searching API. Also, if someone could point me in the right
> direction.
>
> Regards,
>
> Sid.
>


Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread Yonik Seeley
On Thu, Sep 24, 2015 at 10:16 AM, Alessandro Benedetti
 wrote:
> Yonik, I am really excited about the Json faceting module.
> I find it really interesting.
> Is there any pros/cons in using them, or it's definitely the "approach of
> the future" ?

Thanks!

The cons to the new stuff is that it doesn't yet have everything the
old stuff has.  But it does already have new stuff that the old stuff
doesn't have (like sorting by any statistic and rudimentary block-join
integration).

And yes, I do see it as "the future", a platform for integrating the
disparate features that have been developed for solr over time, but
don't always work that well together:
 - search
 - statistics
 - grouping
 - joins


> I saw your benchmarks and seems impressive.
>
> I have not read all the topic in details, just briefly, but is Json
> faceting using different faceting algorithms from the standard ones ? (
> Enum and fc)

I wouldn't say different fundamental algorithms yet... (compared to
4.10) but different code (to support some of the new features) and in
some places more optimized.

> I can not find the algorithm parameter to be passed in the Json facets.

There is an undocumented "method" parameter - I need to enable that to
allow switching between the docvalues approach and the UnInvertedField
approach.

-Yonik


> Are they using a complete different approach ?
> Is the algorithm used expressed anywhere ?
> This could give very good insights on when to use them.
>
> Cheers
>
> 2015-09-24 14:58 GMT+01:00 Yonik Seeley :
>
>> On Mon, Sep 21, 2015 at 8:09 AM, Uwe Reh 
>> wrote:
>> > our bibliographic index (~20M entries) runs fine with Solr 4.10.3
>> > With Solr 5.3 faceted searching is constantly incredibly slow (~ 20
>> seconds)
>> [...]
>> >
>> > The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
>> > 5.3. In Solr 4.10 the 'fieldValueCache' is in heavy use with a
>> > cumulative_hitratio of 1.
>>
>>
>> Indeed.  Use of the fieldValueCache (UnInvertedField) was secretly
>> removed as part of LUCENE-5666, causing these performance regressions.
>>
>> This code had been evolved over years to be very fast for specific use
>> cases.  No one facet algorithm is going to be optimal for everyone, so
>> it's important we have multiple.  But use of the UnInvertedField was
>> removed without any notification or discussion whatsoever (and
>> obviously no benchmarking), and was only discovered later by Solr devs
>> in SOLR-7190 that it was essentially dead code.
>>
>>
>> When I brought back my "JSON Facet API" work to Solr (which was based
>> on 4.10.x) it came with a heavily modified version of UnInvertedField
>> that is available via the JSON Facet API.  It might currently work
>> better for your usecase.
>>
>> On your normal (non-docValues) index, you can try something like the
>> following to see what the performance would be:
>>
>> $ curl http://yxz/solr/hebis/query -d 'q=darwin&
>> json.facet={
>>   authors : { type:terms, field:author_facet, limit:30 },
>>   material_access : { type:terms, field:material_access, limit:30 },
>>   material_brief : { type:terms, field:material_brief, limit:30 },
>>   rvk : { type:terms, field:rvk_facet, limit:30 },
>>   lang : { type:terms, field:language, limit:30 },
>>   dept : { type:terms, field:department_3, limit:30 }
>> }'
>>
>> There were other changes in LUCENE-5666 that will probably slow down
>> faceting on the single valued fields as well (so this may still be a
>> little slower than 4.10.x), but hopefully it would be more
>> competitive.
>>
>> -Yonik
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


How to know index file in OS Cache

2015-09-24 Thread Aman Tandon
Hi,

Is there any way to know that the index file/s is present in the OS cache
or RAM. I want to check if the index is present in the RAM or in OS cache
and which files are not in either of them.

With Regards
Aman Tandon